SRRM2 KD AS analysis

Author

Niccolò Arecco

1 Intro

Alternative splicing (AS) events included in this analysis:

  1. alternatively spliced exons and microexons

  2. intron retention

  3. alternative splice sites (acceptor at 3’ and donor at 5’)

For each sample AS events were quantified using the percentage of sequence inclusion (PSI) metric, which is a number ranging from zero to 100, corresponding to full sequence skipping or full sequence inclusion in all transcripts of a gene, respectively. For example, constitutively spliced exons have a PSI of 100. The PSI cannot be calculated for the very first and last exons of a gene.

These documents reports the RNA-seq downstream AS analysis after processing the samples with VAST-TOOLS(Tapial et al. 2017) for SRRM2 Knockdown (KD) in HeLa(Zhang et al. 2024) (Bo Wang lab) and HepG2(Cui et al. 2023) (Ben Blewncowe, Alan Lambowitz & Paul Schimmel) cells.

Last code execution: 2025 March 03, Monday @ 17:03:52.

2 FASTQ files processing

vast-tools(Tapial et al. 2017) was used to quantify both AS events PSI and gene expression as normalised TPMs using limma::normalizeBetweenArrays and cRPKMs per gene. The analysis was performed using vast-tool v2.5.1.

The raw FASTQ files were processed as following:

2.1 align

vast-tools maps the RNA-seq reads to a predefined set of exon-exon junctions (EEJ). This set of EEJ is an “enriched” version of all ENSEMBL transcripts. This dataset was analysed using Homo sapiens assembly hg38, based on ENSEMBL v88. To do so vast-tools align first trims the reads to 50 nucleotides fragments with a sliding window define by --stepSize (usually 25 or 20 nt). The trimmed reads are aligned to the EEJ libraries using bowtie v1.2. The trimmed reads must have at least 8 nucleotides overlapping the splice site and be uniquely mapping to be considered for quantification.

For this analysis fastq files were processed with vast-tools align with options --sp hg38, ---IR_version 2, --stepSize 20, and --expr. Reads strand detection is done automatically.

2.2 merge

To increase sequencing read depth the vast-tools has a module called merge that pools the individual replicates of each condition into a super sample. This is basically equivalent to concatenating FASTQ files with cat before running align. This helps prevent biases in AS events identification towards highly expressed genes. For this analysis the 3 individual replicates per condition were merged.

2.3 combine

To build the main output containing all detected AS events the vast-tools combine module takes intermediate output files and generate a table named as INCLUSION_LEVELS_FULL-<SPECIES_ASSEMBLY>-<SAMPLES_NUM>-v251.tab. This file contains an AS event on each row, the first 6 columns contain basic information about the event (e.g. gene name, length, and coordinates) followed by a column with the PSI and a quality score column for each sample. Further details about the comma separated quality scores can be found here. This step also generates a similar table gene expression counts expressed as cRPKM or TPM.

All individual replicates and the merged super samples were combined with vast-tools combine with options --add_version, --TPM and --norm for gene expression. For this analysis the output file has 721551 AS events.

2.4 compare

To calculate the differential inclusion level (∆PSI) between samples the combined table was processed with vast-tools compare using the knock down control samples (-a) and the following filtering parameters: --min_dPSI 15, --min_range 0, --max_dPSI 5, --print_sets, --noB3, --p_IR, --print_dPSI, and --GO. The HeLa and HepG2 knock downs were each compared to the respective controls separately. The comparison was done using the 3 individual replicates per condition or by using only the merged super sample. This is denoted in the file names as _uniq_ or _mrgd_.

2.5 tidy

To simplify and filter the full inclusion table the module vast-tools tidy produces a table with only events that have a minimum coverage that are more likely to relevant for downstream analysis.

Full inclusion table was filtered using vast-tools tidy with: --noB3, --p_IR, --add_names, and --log. The tidy module has an option to discard the VLOW events (see below). Here since there are replicates the VLOW events are include in the analysis.

2.6 diff

Lastly, a statistical differential splicing analysis was performed with vast-tools diff using Bayesian inference. Briefly , it calculates PSI posterior distributions using Bayes’ theorem and employs replicates to estimate joint posterior distributions. The statistical significance between two posterior distributions is determined by comparing their differences using empirical distributions (Han et al. 2017) (Weatheritt, Sterne-Weiler, and Blencowe 2016).

The vast-tools diff module was run with options: -n 10000 (lines to process in parallel), -m 0.15 (min ∆PSI/100) -e 10 (minimum reads), -z 16 (random seed), and --noPDF (do not plot the estimates).

3 Selecting differentially spliced events

I use 3 different methods to select differentially spliced events (DSEs).

3.1 Effect size method

The compare module pre-filters the events based on read coverage imbalance and other features. The events are then simply filtered by calculating average and individual ∆PSIs. Between control and test samples the PSI distribution must be non overlapping, i.e. at least 10 PSI difference between the maximum and minimum PSI of the 2 groups (--min_range 10).

Note

compare is a non-statistical approach

Differentially spliced events have a minimum |∆PSI| >= 15. This might have the caveat of missing AS events in lowly expressed genes.

3.2 Frequentist inference method

Using the inclusion table generated by the tidy module, the events were further filtered to have coverage in at least 2 samples per condition and statistical difference between the average PSI in each condition was calculate using the Student’s t-test followed by multiple hypothesis testing (BH) correction using rstatix::t_test in R.

Events with a P-value <= 0.01 and a |∆PSI| >= 10 were considered significant. The non parametric Mann-Whitney (rstatix::wilcox_test) was also used to check if not assuming normality impacts the results drastically in a separate appendix.

3.3 Bayesian inference method

The module diff uses:

  1. A uniform prior distribution Beta(α = 1, β = 1).

  2. A binomial likelihood function for the number of inclusion reads K ~ Binomial(Ψ, N).

where Ψ represents PSI for any AS event and N is the total number of junction reads per event. The formula K ~ Binomial(Ψ, N) means that the number of inclusion reads (K) is assumed to follow a binomial distribution with Ψ as the probability of inclusion (i.e., success) and N as the total number of junction reads per splicing event.

This prior distribution represents our initial beliefs about the PSI before observing any data, while the likelihood function represents the probability of observing the data given a specific PSI value. The likelihood function follows a binomial distribution, which is appropriate for count data like the number of inclusion reads.

According to Bayes’ theorem, our prior beliefs about PSI are updated based on observed data to obtain the posterior distribution. In this case, the prior distribution (uniform Beta) and the likelihood function (binomial) are “conjugate” to each other, which means that the posterior distribution can be expressed in a closed form using the same type of distribution as the prior. Therefore the posterior distribution over Ψ, is represented as a Beta distribution with parameters Ψ ≈ Beta(K + α, (N – K) + β).

If replicates are available, joint posterior distributions for a sample are estimated from sampling empirical posterior distributions of the replicates and fitting a new posterior Beta with maximum likelihood estimation (MLE) using the MASS::fitdist function in R. This basically means combining the information from multiple replicates to get a more robust estimate of the PSI. The statistical significance between two biological conditions, defined as the level of confidence in determining whether two posterior PSI distributions are significantly different from each other, modelled as two posterior distributions X~ Beta, and Y ~ Beta, is calculated as P(X – Y > 0), i.e., X is higher than Y. This probability can be estimated from the difference of empirical distributions sampled between X and Y such that P(X – Y > 0) =\(\sum_{i=1}^{n} (Xi –Yi > 0) / N\). Lastly, significantly differential events were additionally required to have a PSI difference > 10.

Using the diff module of vast-tools filter for events with a Minimum Value (MV) of |∆PSI| at 95% confidence >= 15%. This means that each event as a 95% probability to have a |∆PSI| between the 2 conditions higher or equal to the minimum value, which is 15% ∆PSI.

4 Set up

4.1 Packages

Load common packages that have to be pre-installed. Check Section 10 for package versions.

Code
library(ggplot2)
library(ggrepel)
library(ggforce)
library(pheatmap)
library(UpSetR)
library(viridis)
library(rstatix)
library(Cairo)

In addition, I developed an R package called niar to stream line some of the common analysis steps. It can be installed from my GitHub repository using:

Code
devtools::install_github("Ni-Ar/niar")
library(niar)

Here I load the package from my local repository with:

Code
local <- FALSE
if(local) {
    devtools::load_all(path = '~/mnt/narecco/software/R/niar')
} else if (local == FALSE) {
     devtools::load_all(path = '~/software/R/niar')
} else {
  stop('local is not a Logical')
}

4.2 Functions

Colour palettes:

Events coverage quality score colour palette.

Code
quality_score_colors <- c('#ffffcc', '#c2e699', '#78c679', '#31a354', '#006837')
quality_score_values <- c("N", "VLOW", "LOW", "OK", "SOK")
names(quality_score_colors) <- quality_score_values

SRRM2-KD palette

Code
sample_palette <- c('#ffb703',  '#219ebc')
names(sample_palette) <- c('HeLa', 'HepG2')

Some generic functions I wrote for this analysis:

A plotting function to explore sample’s ∆PSI of exons and introns

Code
#' Group differentially spliced exons and introns
#'
#' @param data A data frame with the columns `dPSI`, `COMPLEX`, `comparison`
#'
#' @return A ggplot
plot_dPSI_hist_grpd <- function(data, xlim = c(-60, 60) ) {
  

  exon_type <- c("S", "C1", "C2", "C3", "ANN", "MIC")
  intron_type <- c("IR")
  alt_ss <- c("Alt3", "Alt5")
  
  data |>
    select(dPSI, COMPLEX, comparison) |>
    mutate(TYPE = case_when(COMPLEX %in% exon_type ~ 'Exon',
                            COMPLEX %in% intron_type ~ 'Intron',
                            COMPLEX %in% alt_ss ~ 'Alt. s.s.',
                            ) ) |>
    # subset( !COMPLEX %in% c('Alt3', 'Alt5') ) |>
    mutate(TYPE = factor(TYPE, levels = c('Exon', 'Intron', 'Alt. s.s.'))) |>
    unique() |>
    ggplot() +
      aes(x = dPSI, fill = dPSI > 0 ) +
      facet_grid(TYPE ~ comparison, scales = 'free_x',
                 labeller = labeller(comparison = c(HeLa_SRRM2KD_mrgd = 'HeLa SRRM2 KD',
                                                    HepG2_SRRM2KD_mrgd = 'HepG2 SRRM2 KD') ) ) +
      geom_histogram(binwidth = 1, show.legend = F) +
      scale_x_continuous(n.breaks = 10) +
      scale_y_continuous(n.breaks = 4, expand = expansion(mult = c(0, 0.01))) +
      scale_fill_manual(values = c("TRUE" = "firebrick3", "FALSE" = "dodgerblue3")) +
      labs(x = "\u0394PSI (KD - Cntrl)", y = "Num. of events") +
      coord_cartesian(xlim = xlim) +
      theme_classic(base_family = 'Arial', base_size = 6) +
      theme(legend.position = 'none', 
            plot.background = element_blank(),
            panel.background = element_blank(), 
            panel.grid.major = element_line(colour = 'gray84', linewidth = 0.1),
            panel.grid.minor.y = element_blank(),
            panel.border = element_blank(),
            axis.text = element_text(colour = 'black'), 
            strip.background = element_blank() ) -> cmpr_dPSI_Hist
  return(cmpr_dPSI_Hist)
}

QC plot with the number of events within each quality score

Code
#' Show the number of events by quality score in a stacked bar plot
#'
#' @param data A dataframe with the complex `COMPLE` and `Quality_Score_Value`
#'
#' @return A ggplot
plot_quality_score_stacked <- function(data) {
  ggplot(data) +
    aes(x = COMPLEX, fill = Quality_Score_Value) +
    geom_bar(colour = 'black', linewidth = 0.2) +
    scale_fill_manual(values = quality_score_colors, name = "Score") +
    scale_y_continuous(expand = expansion(mult = 0, add = 1), n.breaks = 10) +
    labs(x = 'Type of AS event', y = 'Number of AS event') +
    theme_classic(base_family = 'Arial', base_size = 6) +
    theme(legend.position = 'inside',
          legend.position.inside = c(0.45, 0.95),
          legend.key.size = unit(2, units = 'mm'), 
          legend.direction = 'horizontal',
          plot.background = element_blank(),
          panel.background = element_blank(), 
          panel.grid.major.y = element_line(linewidth = 0.2, colour = 'grey84'),
          panel.grid.major.x = element_blank(),
          panel.grid.minor.y = element_blank(),
          panel.border = element_blank(),
          axis.text = element_text(colour = 'black'),
          axis.ticks = element_line(linewidth = 0.2),
          axis.ticks.x = element_blank(),
          axis.line = element_line(linewidth = 0.2)) -> cmpr_Scores_Bars
  return(cmpr_Scores_Bars)
}

Wrapper to explore the PSI and quality scores of specific events.

Code
plot_PSI_boxplot <- function(df, num_col = 5, order_by_dPSI = TRUE, legend_xy =  c(0.93, 0.1) ) {

  if (order_by_dPSI) {
    df <- df |>
    mutate(EVENT = fct_reorder(EVENT, dPSI)) 
  
    gene_order <- unique( df[order(df$EVENT, decreasing = T), ]$GENE)
    
    df <- df |>
      mutate(GENE = factor(GENE, levels = gene_order))
  }
  
  ggplot(df) +
    aes(x = PrettyGroup, y = PSI) +
    facet_wrap( ~ GENE + EVENT, scales = "free_x", ncol = num_col) +
    geom_boxplot(fill = NA, outlier.shape = NA, lwd = 0.2) +
    geom_point( aes(fill = Quality_Score_Value, shape = Replicate),
               size = 1.5, stroke = 0.2, position = position_dodge(width = 0.5)) +
    scale_shape_manual(values = c('Merge' = 22, 'A' = 21, 'Aa' = 21, 'Ab' = 21,
                                  'B' = 21, 'Ba' = 21, 'Bb' = 21, 
                                  'C' = 21, 'Ca' = 21, 'Cb' = 21 ) ) +
    scale_fill_manual(values = quality_score_colors, name = "Quality\nScore") +
    scale_x_discrete(labels = \(x) gsub(pattern = '_', replacement = '\n', x) ) +
    scale_y_continuous(breaks = seq(0, 100, 10), 
                       expand = expansion(mult = 0, add = 2)) +
    guides(fill = guide_legend(override.aes = list(shape = 21)),
           shape = 'none') +
    coord_cartesian(ylim = c(0, 100), clip = 'off') +
    labs(y = 'PSI') +
    theme_classic() +
    theme(axis.text = element_text(colour = "black"),
          axis.line = element_line(linewidth = 0.2),
          axis.title.x = element_blank(),
          strip.background = element_blank(),
          strip.text = element_text(vjust = 1, lineheight = 0, 
                                    margin = margin(t = 0, unit = 'mm')),
          panel.grid.major = element_line(colour = 'gray84', linewidth = 0.15),
          legend.position = 'inside',
          legend.position.inside = legend_xy ) 
}

Calculate z-score on a numeric matrix

Code
zScore <- function(mat) {
  row_means <- apply(mat, 1, mean)
  row_sd <- apply(mat, 1, sd)
  # handle degenerate distribution in which the denominator (row sd) of the z-score formula is zero. 
  # the z-score becomes undefined as the denominator of the z-score formula is zero.  
  # this is a rare situation in general. Here I fix it by diving by 1 instead.
  if( any(row_sd == 0) ) { row_sd[which(row_sd == 0)] <- 1 }
  zScore_mat <- (mat - row_means) / row_sd  
  return(zScore_mat)
}

Alternative formula to z-score that used median

Code
zMad <- function(mat) {
  row_medians <- apply(mat, 1, median)
  row_median_abs_dev <- abs(mat - row_medians)
  median_row_median_abs_dev <- apply(row_median_abs_dev, 1, median)
  # conceptually row_mad is the sd. 1.482 is a constant linked to the assumption of normality of the data
  row_mad <- median_row_median_abs_dev * 1.4826
  
  # If row max is zero fix it by diving by 1 instead.
  if( any(row_mad == 0) ) { row_mad[which(row_mad == 0)] <- 1 } 
   
  zMad_dev <- (mat - row_medians) / row_mad
  return(zMad_dev)
}
# mat <- head(tidy_full_mat)
# zMad(mat)
# zScore(mat)

Function I wrote to make sure alternative donors and acceptors are not repeated between up and down regulated sets (used in tidy analysis).

Code
#' Match Alternative 3' or 5' Splice Site IDs
#'
#' This function takes two sets of upregulated and downregulated alternative 3' or 5' splice site IDs 
#' (VAST-TOOLS format) and resolves them by ensuring the IDs are correctly matched, taking into account 
#' variations in the alternative splicing.
#'
#' @param up_ids A character vector of upregulated splice site IDs (VAST-TOOLS format).
#' @param do_ids A character vector of downregulated splice site IDs (VAST-TOOLS format).
#' @return A character vector of resolved and unique splice site IDs.
#' @examples
#' up_ids <- c("HsaALTA0001463-1/2", "HsaALTA0006052-1/2", "HsaALTA0006256-2/2",
#'             "HsaALTA1001090-3/3", "HsaALTA1015934-2/4")
#' do_ids <- c("HsaALTA0001463-2/2", "HsaALTA0001572-1/5", "HsaALTA0006052-2/2",
#'              "HsaALTA0006256-1/2", "HsaALTA0007106-4/5", "HsaALTA0008669-2/4", "HsaALTA1001090-2/3")
#' match_alt35_id(up_ids, do_ids)
#' # Returns: "HsaALTA0001463-1/2" "HsaALTA0006052-1/2" "HsaALTA0006256-2/2"
#' #          "HsaALTA1001090-3/3" "HsaALTA1015934-2/4" "HsaALTA0001572-1/5" 
#' #          "HsaALTA0007106-4/5" "HsaALTA0008669-2/4"
match_alt35_id <- function(up_ids, do_ids) {
  # Calculate the number of unique IDs combining both upregulated and downregulated sets
  all_up_do_ids_len <- length(unique(c(up_ids, do_ids)))
  
  # Remove specific splice variation suffix (-x/y) and then calculate the number of unique IDs
  sam_up_do_ids_len <- length(unique(c(str_remove(string = up_ids, pattern = '\\-[1-9]/[1-9]'),
                                       str_remove(string = do_ids, pattern = '\\-[1-9]/[1-9]'))))
  
  # Case 1: If the number of unique IDs remains the same after removing variations, return the unique set
  if (all_up_do_ids_len == sam_up_do_ids_len) {
    return(unique(c(up_ids, do_ids)))
  } 
  # Case 2: If the number of unique IDs differs, identify and filter specific splice IDs
  else if (all_up_do_ids_len > sam_up_do_ids_len) {
    
    # Check if the downregulated set has more or equal IDs compared to the upregulated set
    if (length(do_ids) >= length(up_ids)) {
      # Identify and retain downregulated splice IDs that do not match upregulated IDs without the suffix
      indx_do_ids <- which(!str_remove(string = do_ids, pattern = '\\-[1-9]/[1-9]') %in%
                           str_remove(string = up_ids, pattern = '\\-[1-9]/[1-9]'))
      do_ids <- do_ids[indx_do_ids]
      return(unique(c(up_ids, do_ids)))
    }
    
    # Check if the upregulated set has more or equal IDs compared to the downregulated set
    if (length(up_ids) >= length(do_ids)) {
      # Identify and retain upregulated splice IDs that do not match downregulated IDs without the suffix
      indx_up_ids <- which(!str_remove(string = up_ids, pattern = '\\-[1-9]/[1-9]') %in%
                           str_remove(string = do_ids, pattern = '\\-[1-9]/[1-9]'))
      up_ids <- up_ids[indx_up_ids]
      return(unique(c(up_ids, do_ids)))
    }
  }
}

4.3 File paths

Files and directories paths. All files are backed-up on the CRG cluster.

Code
if(local) {
    proj_dir <- file.path('~/mnt/narecco/projects/01_ALTdemix')
} else if (local == FALSE) {
    proj_dir <- file.path('~/projects/01_ALTdemix')
} else {
    stop('local is not a Logical')
}
proj_dir <- normalizePath(proj_dir)
expr_dir <- file.path(proj_dir, 'data/INCLUSION_tbl/Tanja')
vast_tools_dir <- file.path(expr_dir, 'vast_tools')
vast_out <- file.path(expr_dir, 'vast_tools/vast_out')

psi_path <- file.path(vast_out, 'INCLUSION_LEVELS_FULL-hg38-34-v251.tab')
expr_path <- file.path(vast_out, 'TPM-hg38-34-NORM.tab')

cmpr_dir <- file.path(vast_out, 'compare_2024_03_26/min_dPSI15_min_range0_max_dPSI5')
cmpr_HeLa_SRRM2_uniq_path <- file.path(cmpr_dir, 'HeLa_SRRM2KD_vs_CNTRL_uniq_noB3_pIR.tab')
cmpr_HepG2_SRRM2_uniq_path <- file.path(cmpr_dir, 'HepG2_SRRM2KD_vs_CNTRL_uniq_noB3_pIR.tab')
cmpr_HeLa_SRRM2_mrgd_path <- file.path(cmpr_dir, 'HeLa_SRRM2KD_vs_CNTRL_mrgd_noB3_pIR.tab')
cmpr_HepG2_SRRM2_mrgd_path <- file.path(cmpr_dir, 'HepG2_SRRM2_vs_CNTRL_mrgd_noB3_pIR.tab')

tidy_dir <- file.path(vast_out, 'tidy_2024_03_25')
tidy_psi_path <- file.path(tidy_dir, 'INCLUSION_LEVELS_TIDY_HeLa_HepG2_noB3_pIR.tab')

diff_dir <- file.path(vast_out, 'diff_2024_03_25/min_dPSI_min_reads10')
diff_HeLa_SRRM2_path <- file.path(diff_dir, 'HeLa_SRRM2KD_minRead10_noB3_pIR_fltrd_MVdPSI15.tab')
diff_HepG2_SRRM2_path <- file.path(diff_dir, 'HepG2_SRRM2KD_minRead10_noB3_pIR_fltrd_MVdPSI15.tab')

plot_dir <- file.path(expr_dir, 'plots/SRRM2')
if ( !dir.exists(plot_dir) ) { dir.create(path = plot_dir, recursive = T) }

dse_IDs_dir <- file.path(expr_dir, 'diff_spliced_IDs/SRRM2-KD')
if ( !dir.exists(dse_IDs_dir) ) { dir.create(path = dse_IDs_dir, recursive = T) }

5 Differential splicing analysis

Select the differentially spliced events based on the 3 different methods described above. This is the core part of this report.

5.1 Effect size method (compare)

Import all the tables from the vast-tools compare analysis in a list.

Code
list.files(cmpr_dir, full.names = T, pattern = '^He.*SRRM2.*_noB3_pIR.tab$') |>
  lapply(function(x) {
    read_vst_tbl(path = x, show_col_types = FALSE) |>
      tidy_vst_psi() |>
      mutate(comparison = gsub(pattern = '_noB3_pIR.tab', replacement = '', x) ) |>
      mutate(comparison = gsub(pattern = paste0(cmpr_dir, '/'), replacement = '', comparison) ) |>
      mutate(comparison = gsub(pattern = "vs_CNTRL_", replacement = '', comparison) ) |>
      mutate(CellType = str_extract(pattern = "^(HepG2|HeLa)", string =  Sample) )
      # mutate(comparison = gsub(pattern = "(HepG2|HeLa)_", replacement = '', comparison) ) |>
  }) -> list_compares

Bind the list of data frames into one single data frame.

Code
cmpr_events <- do.call('rbind', list_compares) 

Create a metadata data frame from the samples’ names.

Code
cmpr_events |>
  select(Sample, CellType, comparison) |>
  unique() |>
  mutate(KD = gsub(pattern = 'KD_.*$', replacement = '', comparison ) )  |>
  mutate(KD = str_remove(string = KD, pattern = '(HepG2|HeLa)_') ) |>
  mutate(Replicate = str_split_fixed(string = Sample, pattern = "_", n = 3)[,3]) |>
  mutate(Replicate = ifelse(Replicate == '', yes = 'Merge', no = Replicate ) ) |>
  mutate(PrettySample = str_remove(string = Sample, pattern = '(HepG2|HeLa)_') ) |>
  mutate(PrettyGroup = str_remove(string = Sample, pattern = '_[aA-cC].$'),
         PrettyGroup = str_remove(string = PrettyGroup, pattern = '_(A|B|C)$') ) |>
  mutate(PrettyGroup = factor(PrettyGroup, 
                              levels = c("HeLa_CNTRL", "HeLa_SON", "HeLa_SRRM2",
                                         "HepG2_CNTRL", "HepG2_MARS", 
                                         "HepG2_RARS", "HepG2_SRRM2") ) ) |>
  relocate(PrettySample, PrettyGroup, .after = Sample) |>
  # drop comparison column
  select( !c(comparison) ) |>
  unique() -> mtdf

Add metadata info to the compared events table.

Code
cmpr_events <- left_join(x = cmpr_events, y = mtdf, 
                         by = join_by(Sample, CellType) ) 

5.1.1 Events coverage QC

Note

Not all AS events are easily measurable and not all events have the same sequencing coverage.

The AS events quantified by vast-tools are divided in different subgroups based on their complexity, which is defined in the 6th column (COMPLEX) of the compare table. This complexity resemble the underlying exon-intron structures in the gene body. The complexity of an AS exon was established based on Score 5 from the quality column of the inclusion table. It summaries the number of reads that come from the “reference” C1-A, A-C2 and C1-C2 junctions for a wide panel of RNA-seq samples (see Irimia et al. 2014 for further information). The complexity score is the same for all the samples for any given event.

Namely:

  • S, C1, C2, C3 are types of exon skipping events quantified by the splice site-based or transcript-based modules, with increasing degrees of complexity (e.g. S = simple; C3 more complex than C2).
  • ANN are exon skipping events quantified by the ANNOTATION module. Their IDs start with 6 or 7 (e.g. HsaEX6000001 or HsaEX7000001). These are exons from the reference GTF annotation used to build VAST-TOOLS. However, some annotated events are not present, because of low mappability or reads imbalance.
  • MIC are exon skipping events quantified by the microexon pipeline.
  • IR are intron retention event.
  • Alt3 are alternative acceptor events at the 3’ splice site of an exon.
  • Alt5 are alternative donor events at the 5’ splice site of an exon.

In addition, an AS event can be measured at different sequencing depths. In vast-tool this is resembled by Score 1 of the quality column. Each event in each samples is given a score based on the actual read counts, before mappability correction. This coverage scores are: N, VLOW, LOW, OK, and SOK, (where N = Not meeting minimum threshold, VLOW = Very low and SOK = Super okay). In general anything above N can be considered reliable and worth considering, but VLOW events might have an actual different PSI value if confirmed experimentally with a PCR on RT-cDNA.

Here I check if the quality score of the events changes when including or excluding the super sample.

Code
ggplot(cmpr_events) +
  aes(x = Quality_Score_Value, fill = comparison) +
  facet_grid(~COMPLEX)  +  
  geom_bar(position = position_dodge(), width = 0.75, colour = 'black', linewidth = 0.1) +
  scale_fill_manual(values = c('#FFCC33', 'darkgoldenrod', '#009990', 'green4'),
                    name = 'compared against control') +
  scale_y_continuous(expand = expansion(add = c(0, 1)), n.breaks = 10) +
  labs(x = 'AS event quantification quality score', 
       y = 'Number of AS events') +
  theme_bw(base_family = 'Arial') +
  theme(axis.text = element_text(colour = 'black'),
        axis.text.x = element_text(angle = 45, hjust = 1),
        axis.ticks.x = element_blank(),
        strip.background = element_blank(),
        plot.background = element_blank(),
        legend.position = 'bottom',
        legend.key.size = unit(x = 3, units = 'mm'),
        legend.background = element_blank(),
        legend.margin = margin(t = 0, unit = 'mm'),
        panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.major.y = element_line(linewidth = 0.1, colour = 'gray84'),
        panel.background = element_blank(),
        panel.border = element_rect(linewidth = 0.2, colour = 'black')) -> p_QC_events_by_type
p_QC_events_by_type

As one can see there’s a gain in lower quality events if including the merged replicates super samples. This behaviour is kind of expected.

Code
ggsave(filename = '01_QC_AS_Events_by_Type_Coverage_Barplot.pdf', plot = p_QC_events_by_type, 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 30,
       height = 10)

When filtering out for N events…

Code
fltrd_cmpr_events <- subset(cmpr_events, as.integer(Quality_Score_Value) >= 2)

…and repeating the plot.

Code
ggplot(fltrd_cmpr_events) +
  aes(x = Quality_Score_Value, fill = comparison) +
  facet_grid(~COMPLEX)  +  
  geom_bar(position = position_dodge(), width = 0.75, colour = 'black', linewidth = 0.1) +
  scale_fill_manual(values = c('#FFCC33', 'darkgoldenrod', '#009990', 'green4'), 
                    name = 'compared against control') +
  scale_y_continuous(expand = expansion(add = c(0, 1)), n.breaks = 10) +
  labs(x = 'AS event quantification quality score', 
       y = 'Number of AS events') +
  theme_bw(base_family = 'Arial') +
  theme(axis.text = element_text(colour = 'black'),
        axis.text.x = element_text(angle = 45, hjust = 1),
        axis.ticks.x = element_blank(),
        strip.background = element_blank(),
        plot.background = element_blank(),
        legend.position = 'bottom',
        legend.key.size = unit(x = 3, units = 'mm'),
        legend.background = element_blank(),
        legend.margin = margin(t = 0, unit = 'mm'),
        panel.grid.minor = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.major.y = element_line(linewidth = 0.1, colour = 'gray84'),
        panel.background = element_blank(),
        panel.border = element_rect(linewidth = 0.2, colour = 'black')) -> p_fltrd_QC_events_by_type
p_fltrd_QC_events_by_type

The number of DSEs is usally always higher for the mrgd super samples comparison. There seems to be a good accordance in the number of DSE in HeLa and HepG2 cells. Also there are basically no microexon events differentially spliced.

Code
ggsave(filename = '02_QC_AS_Events_No_N_by_Type_Filtered_by_Coverage_Barplot.pdf', 
       plot = p_fltrd_QC_events_by_type, device = cairo_pdf, 
       path = plot_dir, units = 'cm', width = 30, height = 10)

5.1.2 Shared and Specific DSEs

How many of these events overlap between the different comparisons?

Make a list with all the differentially spliced events from each comparison, again filtering out the N events.

Code
# get the filtered events
lapply( list_compares, function(x){
  subset(x, as.integer(Quality_Score_Value) >= 2) |>
    select(GENE, EVENT, comparison) |> unique()
}) -> diff_events_li

# find events that are mis-spliced in all 4 comparisons
# all_common_DSE_ids <- Reduce(intersect, lapply(diff_events_li, function(x) x$EVENT ) )

# find the HeLa events mis-spliced in both the merged supersample and the individual replicates
HeLa_common <- Reduce(intersect, lapply(diff_events_li[1:2], function(x) x$EVENT ) )

# find the HepG2 events mis-spliced in both the merged supersample and the individual replicates
HepG2_common <- Reduce(intersect, lapply(diff_events_li[3:4], function(x) x$EVENT ) )

message('Number of DSE (uniq & mrgd) common between HeLa and HepG2: ', length(intersect(x = HeLa_common, y = HepG2_common)), " !")
Number of DSE (uniq & mrgd) common between HeLa and HepG2: 0 !

This is because some events are lost in the uniq comparison type. If I check only between HeLa and HepG2 supersamples.

Code
HeLa_HepG2_mrgd_common <- Reduce(intersect, lapply( diff_events_li[c(1,3)] , function(x) x$EVENT ) )

message('Number of DSE (mrgd) common between HeLa and HepG2: ', length(HeLa_HepG2_mrgd_common), " !")
Number of DSE (mrgd) common between HeLa and HepG2: 42 !

Check some of them:

Code
set.seed(9)
subset(fltrd_cmpr_events, EVENT %in% sample(HeLa_HepG2_mrgd_common, 9) )  |>
  subset(PrettyGroup %in% c('HeLa_CNTRL', 'HeLa_SRRM2', 'HepG2_CNTRL', 'HepG2_SRRM2') ) |>
  plot_PSI_boxplot(order_by_dPSI = T) -> p_examples

p_examples

Code
ggsave(filename = '03_HeLa_HepG2_Examples_PSI_Changes_Barplot.pdf', plot = p_examples, 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 27,
       height = 10)

Some ∆PSI are not in the same direction when comparing the HeLa and HepG2 cells.

Check the events in the SRRM2 gene.

Code
subset(fltrd_cmpr_events, GENE == 'SRRM2' )  |>
  subset(PrettyGroup %in% c('HeLa_CNTRL', 'HeLa_SRRM2', 'HepG2_CNTRL', 'HepG2_SRRM2') ) |>
  plot_PSI_boxplot(legend_xy = c(0.9, 0.7))

Code
ggsave(filename = '04_HeLa_HepG2_SRRM2_PSI_Changes_Barplot.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 14,
       height = 6)

The exon seem to make sense. The intronic event if it’s more included in the KD in HeLa would likely contribute to NMD. This is very likely a casa in high the intron of SRRM2 is regulated by SRRM2 itself to auto-regulate its abundance.

Make a list of different compare sets of DSEs

Code
# get comparisons names
comparisons <- lapply( list_compares, function(x){ pull(x, comparison) |> unique() }) |> unlist()

# get events in each comparison
events_li <- lapply( list_compares, function(x){ 
  subset(x, as.integer(Quality_Score_Value) >= 2) |>
    pull(EVENT) |> unique() 
}) 

names(events_li) <- comparisons
lapply(events_li, head, 3)
$HeLa_SRRM2KD_mrgd
[1] "HsaEX0056692" "HsaEX0057076" "HsaEX0051924"

$HeLa_SRRM2KD_uniq
[1] "HsaEX0047222" "HsaEX0036226" "HsaEX0037131"

$HepG2_SRRM2KD_mrgd
[1] "HsaEX0017128" "HsaEX0031099" "HsaEX1013836"

$HepG2_SRRM2KD_uniq
[1] "HsaEX1013836" "HsaEX0034749" "HsaEX0028282"

How many events in each group?

Code
lapply(events_li, length)
$HeLa_SRRM2KD_mrgd
[1] 1312

$HeLa_SRRM2KD_uniq
[1] 117

$HepG2_SRRM2KD_mrgd
[1] 1943

$HepG2_SRRM2KD_uniq
[1] 179

To show how many AS events are differentially spliced in each condition taking into consideration the replicates or the super sample, calculate the intersections between all 4 comparisons.

Code
intersection_cmpr_plt <- upset(fromList(events_li), nsets = length(comparisons),
                               text.scale = 1.5, nintersects = NA, 
                               mb.ratio = c(0.65, 0.35), decreasing = c(T, F),
                               order.by = 'freq',
                               main.bar.color = "gray16") 

Show the overlap as an upset plot:

Code
intersection_cmpr_plt

The plot above shows all possible intersections between the 4 groups.

The weird result is that there’s a very small between the knockdown in HeLa or HepG2 cells!! This requires further exploration.

Because the super samples (defined with _mrgd) are the sum of all the 3 individual replicates, they have more coverage and can better quantify exon-exon-junction reads. This explains the increased in number of DSEs.

Save the upset plot to pdf.

Code
cairo_pdf(file = file.path(plot_dir, '05_overlap_fltrd_compare_DSE_Upset.pdf'), 
          width = 6.5, height = 4, bg = NA)
intersection_cmpr_plt
dev.off()
png 
  2 

Separate the parts of the dataframe by the cell type to look at the ∆PSI of the mrgd super samples.

Code
cmpr_events_HeLa <- subset(fltrd_cmpr_events, comparison == 'HeLa_SRRM2KD_mrgd') |>
  subset( grepl(pattern = '(CNTRL|SRRM2)', x = Sample, perl = T) ) |>
  subset( grepl(pattern = '^HeLa_', x = Sample, perl = T) )
Code
cmpr_events_HepG2 <- subset(fltrd_cmpr_events, comparison == 'HepG2_SRRM2KD_mrgd') |>
  subset( grepl(pattern = '(CNTRL|SRRM2)', x = Sample, perl = T) ) |>
  subset( grepl(pattern = '^HepG2_', x = Sample, perl = T) )

Check the overall quality / confidence of the mrgd events. This plot should be improved.

Code
plot_quality_score_stacked(cmpr_events_HeLa) +
  labs(title = 'HeLa mrgd') -> p_cmpr_scores_bars_HeLa

plot_quality_score_stacked(cmpr_events_HepG2) +
  labs(title = 'HepG2 mrgd') -> p_cmpr_scores_bars_HepG2

p_cmrp_scores_bar <- p_cmpr_scores_bars_HeLa / p_cmpr_scores_bars_HepG2
p_cmrp_scores_bar

The different numbers of events tend to be similar except maybe for intron retentions and simple exon skipping that have opposite trends. Save bar plot.

Code
ggsave(filename = '06_Compare_DSE_Score_Bar.pdf', plot = p_cmrp_scores_bar, 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 7,
       height = 8)

Check the distribution of ∆PSI using a histogram with binwidth of 1. The top part are all differentially spliced exons (S, C1, C2, and C3), while the bottom part are the intron retentions (IR).

Code
plot_dPSI_hist_grpd(data = cmpr_events_HeLa) +
  labs(title = 'HeLa mrgd') -> p_cmpr_dPSI_hist_HeLa

plot_dPSI_hist_grpd(data = cmpr_events_HepG2) +
  labs(title = 'HepG2 mrgd') -> p_cmpr_dPSI_hist_HepG2

hist_list <- list(p_cmpr_dPSI_hist_HeLa, p_cmpr_dPSI_hist_HepG2)
p_cmpr_dPSI_hist <- wrap_plots(hist_list, byrow = F, heights = c(1, 1) )
p_cmpr_dPSI_hist

The breath of the ∆PSI of the up-regulated exons is somewhat comparable, however the number of down-regulated exons in HepG2 is way lower than what detected in HeLa. The HepG2 cells have many less retained introns upon SRRM2 KD, which is not observed to the same extent in HeLa cells. Save histogram plot.

Code
ggsave(filename = '07_compare_DSE_dPSI_Hist.pdf', plot = p_cmpr_dPSI_hist, 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 10,
       height = 12)

Select events that are differentially spliced using the comparison method

I select all the events in the mrgd super samples are differentially spliced events, defined as: |∆PSI| >= 15.

Separate between the exons, introns, and alternative splice sites that are up or down spliced in SRRM2 KD in HeLa.

Code
cmpr_events_HeLa |>
  subset(abs(dPSI) >= 15 ) |>
  select(GENE, EVENT, dPSI) |>
  unique() |>
  arrange(desc(dPSI)) -> cmpr_HeLa

up_cmpr_HeLa <- subset(cmpr_HeLa, dPSI >= 0)
do_cmpr_HeLa <- subset(cmpr_HeLa, dPSI <= 0)

EX_up_cmpr_HeLa <- grep(pattern = "^HsaEX", x = up_cmpr_HeLa$EVENT, value = T)
EX_do_cmpr_HeLa <- grep(pattern = "^HsaEX", x = do_cmpr_HeLa$EVENT, value = T)

IN_up_cmpr_HeLa <- grep(pattern = "^HsaINT", x = up_cmpr_HeLa$EVENT, value = T)
IN_do_cmpr_HeLa <- grep(pattern = "^HsaINT", x = do_cmpr_HeLa$EVENT, value = T)

A3_up_cmpr_HeLa <- grep(pattern = "^HsaALTA", x = up_cmpr_HeLa$EVENT, value = T)
A3_do_cmpr_HeLa <- grep(pattern = "^HsaALTA", x = do_cmpr_HeLa$EVENT, value = T) # empty

A3_cmpr_HeLa <- match_alt35_id(up_ids = A3_up_cmpr_HeLa, do_ids = A3_do_cmpr_HeLa)

A5_up_cmpr_HeLa <- grep(pattern = "^HsaALTD", x = up_cmpr_HeLa$EVENT, value = T)
A5_do_cmpr_HeLa <- grep(pattern = "^HsaALTD", x = do_cmpr_HeLa$EVENT, value = T) # empty

A5_cmpr_HeLa <- match_alt35_id(up_ids = A5_up_cmpr_HeLa, do_ids = A5_do_cmpr_HeLa)

message('HeLa SRRM2 KD:',
        '\nUP Exon: ', length(EX_up_cmpr_HeLa),
        '\nDOWN Exon: ', length(EX_do_cmpr_HeLa),
        '\nUP Intron: ', length( IN_up_cmpr_HeLa),
        '\nDOWN Intron: ', length( IN_do_cmpr_HeLa),
        "\nDifferentially spliced 3' splice sites: ", length(A3_cmpr_HeLa),
        "\nDifferentially spliced 5' splice sites: ", length(A5_cmpr_HeLa) )
HeLa SRRM2 KD:
UP Exon: 343
DOWN Exon: 416
UP Intron: 132
DOWN Intron: 74
Differentially spliced 3' splice sites: 177
Differentially spliced 5' splice sites: 170

Separate between the exons, introns, and alternative splice sites that are up or down spliced in SRRM2 KD in HepG2.

Code
cmpr_events_HepG2 |>
  subset(abs(dPSI)  >= 15 ) |>
  select(GENE, EVENT, dPSI) |>
  unique() |>
  arrange(desc(dPSI)) -> cmpr_HepG2

up_cmpr_HepG2 <- subset(cmpr_HepG2, dPSI >= 0)
do_cmpr_HepG2 <- subset(cmpr_HepG2, dPSI <= 0)

EX_up_cmpr_HepG2 <- grep(pattern = "^HsaEX", x = up_cmpr_HepG2$EVENT, value = T)
EX_do_cmpr_HepG2 <- grep(pattern = "^HsaEX", x = do_cmpr_HepG2$EVENT, value = T)

IN_up_cmpr_HepG2 <- grep(pattern = "^HsaINT", x = up_cmpr_HepG2$EVENT, value = T)
IN_do_cmpr_HepG2 <- grep(pattern = "^HsaINT", x = do_cmpr_HepG2$EVENT, value = T)

A3_up_cmpr_HepG2 <- grep(pattern = "^HsaALTA", x = up_cmpr_HepG2$EVENT, value = T)
A3_do_cmpr_HepG2 <- grep(pattern = "^HsaALTA", x = do_cmpr_HepG2$EVENT, value = T) # empty

A3_cmpr_HepG2 <- match_alt35_id(up_ids = A3_up_cmpr_HepG2, do_ids = A3_do_cmpr_HepG2)

A5_up_cmpr_HepG2 <- grep(pattern = "^HsaALTD", x = up_cmpr_HepG2$EVENT, value = T)
A5_do_cmpr_HepG2 <- grep(pattern = "^HsaALTD", x = do_cmpr_HepG2$EVENT, value = T) # empty

A5_cmpr_HepG2 <- match_alt35_id(up_ids = A5_up_cmpr_HepG2, do_ids = A5_do_cmpr_HepG2)

message('HepG2 SRRM2 KD:',
        '\nUP Exon: ', length(EX_up_cmpr_HepG2),
        '\nDOWN Exon: ', length(EX_do_cmpr_HepG2),
        '\nUP Intron: ', length(IN_up_cmpr_HepG2),
        '\nDOWN Intron: ', length(IN_do_cmpr_HepG2),
        "\nDifferentially spliced 3' splice sites: ", length(A3_cmpr_HepG2),
        "\nDifferentially spliced 5' splice sites: ", length(A5_cmpr_HepG2) )
HepG2 SRRM2 KD:
UP Exon: 521
DOWN Exon: 289
UP Intron: 90
DOWN Intron: 507
Differentially spliced 3' splice sites: 279
Differentially spliced 5' splice sites: 257

There are some changes in AS upon SRRM2 KD in HeLa or in HepG2 cells.

5.2 Frequentist inference method (from tidy)

Start by importing the tidy data set. Here “tidy” means a subset of all identified AS events that have generally a good score (i.e. coverage) and balanced read counts on both splice sites.

Code
tidy_events <- read_vst_tbl(path = tidy_psi_path, show_col_types = FALSE) |>
  mutate(GENE = str_remove(pattern = '=Hsa.*$', string = EVENT) ) |>
  mutate(EVENT = str_remove(pattern = '^*.*=', string = EVENT) ) |>
  relocate(GENE, .before = EVENT) |>
  pivot_longer(cols = starts_with('He'), names_to = 'Sample', values_to = 'PSI') 

message('Tidy events: ', length(unique(tidy_events$EVENT)))
Tidy events: 34490
Code
tidy_events_backup <- tidy_events

Select only the samples from the SRRM2 KD experiment

Code
tidy_events <- tidy_events[grepl(pattern = '(CNTRL|SRRM2)', x = tidy_events$Sample, perl = T), ]

5.2.1 PCA

First let’s use these events to check the dispersion of the samples with a PCA. First turn the PSI dataframe into a matrix

Code
tidy_events |> 
  select(GENE, EVENT, Sample, PSI) |>
  pivot_wider(id_cols = EVENT, names_from = Sample, values_from = PSI) |>
  column_to_rownames('EVENT') |>
  as.matrix() -> tidy_mat

Plot the PCA using the function showme_PCA2D form my R package and save it.

Code
showme_PCA2D(log2(tidy_mat), mt = mtdf, mcol = 'Sample', 
             real_aspect_ratio = F, m_fill = 'CellType', m_label = 'Sample',
             n_top_var = 1000)

Code
ggsave(filename = '10_tidy_PCA_HeLa_HepG2.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 15,
       height = 10)
Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

The first principal component separates between the 2 cell lines and probably there’s a big influence in batch effect between the 2 different experiments.

Code
showme_PCA2D(log2(tidy_mat), mt = mtdf, mcol = 'Sample', x = 2,
             real_aspect_ratio = F, m_fill = 'CellType', 
             m_label = 'Sample', n_top_var = 1000)

The second and third components start separating the data based on the biological KD, but the variance explained in the data is minimal.

It’s probably better to analyse the HeLa cells separately from the HepG2 to focus more on true biological effect.

Code
tidy_events_HeLa <- tidy_events_backup[grepl(pattern = 'HeLa', x = tidy_events_backup$Sample, perl = T), ]
tidy_events_HeLa |> 
  select(GENE, EVENT, Sample, PSI) |>
  pivot_wider(id_cols = EVENT, names_from = Sample, values_from = PSI) |>
  column_to_rownames('EVENT') |>
  as.matrix() -> tidy_mat_HeLa

showme_PCA2D(mat = log2(tidy_mat_HeLa), mt = mtdf, mcol = 'Sample', real_aspect_ratio = F,
             m_fill = 'PrettyGroup', m_label = 'PrettySample', n_top_var = 1000)

Code
ggsave(filename = '11_tidy_PCA_HeLa.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 15,
       height = 10)

The PC1 is separating the SON KD down-samples while PC2 is separating SRRM2.

Code
tidy_events_HepG2 <- tidy_events_backup[grepl(pattern = 'HepG2', x = tidy_events_backup$Sample, perl = T), ]
tidy_events_HepG2 |> 
  select(GENE, EVENT, Sample, PSI) |>
  pivot_wider(id_cols = EVENT, names_from = Sample, values_from = PSI) |>
  column_to_rownames('EVENT') |>
  as.matrix() -> tidy_mat_HepG2

showme_PCA2D(mat = log2(tidy_mat_HepG2), mt = mtdf, mcol = 'Sample', real_aspect_ratio = F,
             m_fill = 'PrettyGroup', m_label = 'PrettySample', n_top_var = 1000)

Code
ggsave(filename = '12_tidy_PCA_HepG2.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 15,
       height = 10)

Here, PC1 is separating a bit of everything, and again PC2 is splitting SRRM2 vs MARS KD. In the HepG2 dataset the samples are a bit more spread out, with replicates Ca and Cb being a bit of an outlier.

5.2.2 Heatmaps

Explore the samples with a heatmap of the AS events. First let’s check how many events are missing the PSI (e.g. not enough coverage). In the next heatmap the colour white represent NA values and black represent a quantification (from 0 to 100). Of course the merged super samples have more coverage. I don’t think there’s a specific patter and I’d say that the values are missing at random.

Code
cairo_pdf(file = file.path(plot_dir, '13_tidy_missing_values_heatmap.pdf'), 
          width = 5.5, height = 8, bg = NA)
pheatmap(tidy_mat, na_col = 'white', cluster_rows = F, cluster_cols = T,
         legend = F, show_rownames = F, color = 'black', 
         main = paste0( nrow(tidy_mat), ' AS events' ) )

Code
dev.off()
cairo_pdf 
        3 

One can see that the HepG2 samples are probably more deeply sequenced and have less missing values.

Since the the super samples have more coverage let’s plot only the AS events in those 3 columns. NA are again displayed in white.

Code
mat <- tidy_mat[, c('HeLa_CNTRL', 'HeLa_SRRM2', 'HepG2_CNTRL', 'HepG2_SRRM2')]

pheatmap(mat = mat, na_col = 'white', cluster_rows = F, cluster_cols = T,
         main = paste0( nrow(mat), ' AS events'), angle_col = 90,
         show_rownames = F, color = viridis(n = 100), cutree_cols = 4)

Code
rm(mat)

The trend is pretty much the same, but HepG2 have more coverage than HeLa cells.

5.2.3 Testing for significance in HeLa cells

To detect subtle changes with a more statistical framework I try to calculate a P value using the individual replicates PSI and plot them against the ∆PSI as a Volcano plot. To achieve this first define the sample types.

Code
mrgd_cntrl <- "HeLa_CNTRL"
mrgd_SRRM2 <- "HeLa_SRRM2"

CNTRLKD <- c("HeLa_CNTRL_A", "HeLa_CNTRL_B", "HeLa_CNTRL_C")
SRRM2KD <- c("HeLa_SRRM2_A", "HeLa_SRRM2_B", "HeLa_SRRM2_C")

Filter for samples that have finite ∆PSI (meaning that is not just NA in one of the mrgd samples) and that have maximum one sample per condition as NA (meaning at least 2 samples are not NA). This is step is done only on the SRRM2 KD in HeLa cells.

Code
te_HeLa_SRRM2 <- tidy_events_HeLa[grepl(pattern = '(CNTRL|SRRM2)', x = tidy_events_HeLa$Sample, perl = T), ]

Filter the data.

Code
te_HeLa_SRRM2 |>
  group_by(EVENT) |>
  summarise(mPSI_CNTRL = mean(PSI[Sample %in% mrgd_cntrl], na.rm = T),
            mPSI_SRRM2 = mean(PSI[Sample %in% mrgd_SRRM2], na.rm = T),
            
            Samples_NA_CNTRL = sum(is.na(PSI[Sample %in% CNTRLKD])),
            Samples_NA_SRRM2 = sum(is.na(PSI[Sample %in% SRRM2KD])) ) |>
  # calculate ∆PSI
  mutate(dPSI_SRRM2 = mPSI_SRRM2 - mPSI_CNTRL ) |>
  subset( is.finite(dPSI_SRRM2) ) |>
  # select samples with max 1 NA PSI value per event
  # i.e. at least 2 samples with a finite PSI
  subset( Samples_NA_CNTRL <= 1) |>
  subset( Samples_NA_SRRM2 <= 1) |>
  select(EVENT, dPSI_SRRM2) |>
  unique() |>
  # Add the ∆PSI of the filtered events to the original data frame
  left_join(y = te_HeLa_SRRM2, by = join_by(EVENT)) |>
  relocate(GENE, .before = EVENT) |>
  relocate(starts_with('dPSI'), .after = PSI) |>
  # add metadata information
  left_join(y = mtdf, by = join_by(Sample)) |>
  # remove merged super sample PSIs
  subset( !Replicate == 'Merge') |>
  group_by(EVENT) |>
  # for the statistical test select only events that are changing a bit to enter the test
  # if not the data is essentially constant and statistical test fails 
  # this is calculated by measuring the PSI difference standard deviation that should not be zero
  mutate(dSD_SRRM2 = sd(PSI[Sample %in% CNTRLKD] - PSI[Sample %in% SRRM2KD],
                        na.rm = T) ) |>
  subset(dSD_SRRM2 != 0) |>
  select( !c(starts_with("dSD_"), PrettySample, KD, CellType)) |> # remove these columns
  # In addition one could also select the events that have a ∆PSI different from zero.
  # subset( abs(dPSI_SRRM2) >= 0.01 ) |>
  ungroup() -> tidy_dPSI_HeLa

message(length(unique(tidy_dPSI_HeLa$EVENT)), '/', 
        length(unique(tidy_events$EVENT)), ' (', 
        100*signif(length(unique(tidy_dPSI_HeLa$EVENT))/length(unique(tidy_events$EVENT)), 2), '%) ', 
        'filtered events that enter the statistical analysis')
10821/34490 (31%) filtered events that enter the statistical analysis
Important

The following analysis uses Student’s t-test on PSI values that are bounded between 0 and 100. Some people might find this “statistically wrong” as the PSI distribution does not follow normality. My argument is that PSI is close enough to normality to allow for the use of a t-test and the test itself is pretty robust to non-normality anyway. This means that I believe I can use the t-test for this data; not that I should.

Calculate P values for SRRM2 KD in HeLa.

Code
tidy_dPSI_HeLa |> 
  group_by(EVENT) |>
  t_test(formula = PSI ~ PrettyGroup, ref.group = 'HeLa_CNTRL', 
         alternative = "two.sided", paired = F, p.adjust.method = 'none',
         conf.level = 0.95, detailed = F) |>
  adjust_pvalue(method = "BH") |> add_significance("p.adj") |>
  select(EVENT, p, p.adj, p.adj.signif) |>
  setNames(c('EVENT', 'p_SRRM2_HeLa', 'padj_SRRM2_HeLa', 
             'padj_signif_SRRM2_HeLa')) -> pvals_SRRM2_HeLa

Combine all data frames into one. If an event as a P value that is NA, discard it.

Code
left_join(x = tidy_dPSI_HeLa, y = pvals_SRRM2_HeLa, by = join_by(EVENT)) |>
  subset(!is.na(p_SRRM2_HeLa)) |>
  subset(!is.na(padj_SRRM2_HeLa)) -> tidy_dPSI_pval_HeLa

Check the P value distribution.

Code
tidy_dPSI_pval_HeLa |>
  select(EVENT, p_SRRM2_HeLa) |>
  pivot_longer(cols = starts_with("p")) |>
  ggplot() +
  aes(x = value, fill = name) +
  geom_histogram(bins = 25, colour = 'black', show.legend = F) + 
  labs(x = "P values") + theme_bw() 

Well that’s not that promising but at least it is somewhat uniform.

To control for this I set a non-corrected P value threshold to 0.01, which is more conservative, plus the size effect threshold is |∆PSI| >= 10.

Code
dPSI_squish <- 40

tidy_dPSI_pval_HeLa <- tidy_dPSI_pval_HeLa |>
  # label the events that are below significant thresholds
  mutate(DSE_SRRM2_HeLa = p_SRRM2_HeLa < pval_thrshld & abs(dPSI_SRRM2) >= dPSI_thrshld ) 

5.2.3.1 Volcano plots

Volcano plot of SRRM2 KD in HeLa. Points with |∆PSI| > 40 are squished on the sides of the plot.

Code
# select fewer columns for plotting
subset(tidy_dPSI_pval_HeLa) |>
  select(GENE, EVENT, dPSI_SRRM2, p_SRRM2_HeLa, padj_SRRM2_HeLa, DSE_SRRM2_HeLa) |>
  unique() -> plot_HELA

ggplot(plot_HELA) +
  aes(x = dPSI_SRRM2, y = -log10(p_SRRM2_HeLa), fill = DSE_SRRM2_HeLa) +
  geom_hline(yintercept = -log10(pval_thrshld), linewidth = 0.1, colour = 'black') +
  geom_vline(xintercept = -dPSI_thrshld, linewidth = 0.1, colour = 'black') +
  geom_vline(xintercept = dPSI_thrshld, linewidth = 0.1, colour = 'black') +
  geom_text_repel(data = subset(plot_HELA, DSE_SRRM2_HeLa),
                  aes(label = GENE, x = dPSI_SRRM2, y = -log10(p_SRRM2_HeLa)),
                      seed = 16, show.legend = F, segment.curvature = -1e-20,
                      family = "Arial", size = 2, nudge_x = -0.25,
                      segment.color = 'black', verbose = F, lwd = 0.1,
                      box.padding = grid::unit(1, "mm"),
                      point.padding = grid::unit(0.55, "mm"),
                      max.overlaps = 20 ) +
  geom_point(shape = 21, show.legend = F, alpha = 1, stroke = 0.2) +
  scale_fill_manual(values = c('gray84', 'firebrick3') ) +
  scale_x_continuous(n.breaks = 10, oob = scales::squish, limits = c(-dPSI_squish, dPSI_squish),
                     expand = expansion(mult = 0, add = 0.5)) +
  scale_y_continuous(expand = expansion(mult = c(0.01, 0.05), add = 0) ) +
  coord_cartesian(ylim = c(0, 6.05)) + # set this to have the 2 volcano plots with the same Y-axis range
  labs( x = "\u0394PSI (SRRM2 KD - Cntrl)", y = expression(-log[10] ~ "P value") ) +
  theme_classic(base_family = 'Arial') +
  theme(axis.text = element_text(colour = 'black'),
        axis.line = element_line(colour = 'black', linewidth = 0.2),
        panel.grid.major = element_line(linewidth = 0.1),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.background = element_blank())

Code
ggsave(filename = '14_tidy_SRRM2_Volcano_HeLa.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 10,
       height = 12)

Check the PSI of some of them.

Code
subset(tidy_dPSI_pval_HeLa) |>
  select(GENE, EVENT, dPSI_SRRM2, p_SRRM2_HeLa, padj_SRRM2_HeLa, padj_signif_SRRM2_HeLa, DSE_SRRM2_HeLa) |>
  subset(DSE_SRRM2_HeLa) |>
  unique() |>
  arrange(padj_SRRM2_HeLa) |>
  head(10) |>
  pull(EVENT) -> top_tidy_events_HeLa

Grep the PSIs as not all of them are in the compared table. This my take a while…

Code
grep_psi(inclusion_tbl = psi_path, vst_id = top_tidy_events_HeLa) |>
  tidy_vst_psi() -> tmp

Plot some PSIs.

Code
left_join( x = tmp, y = mtdf, by = join_by(Sample)) |>
  subset(CellType == 'HeLa' ) |>
  plot_PSI_boxplot(num_col = 5, order_by_dPSI = F, legend_xy = c(1, 3))

As one can see in the plot above, all these events have very good accordance between replicates (i.e. low standard deviation). So applying the Student’s t-test on PSI values with a low P values filter (0.01) is a good approach for detecting events that are clearly differentially spliced.

Select the differentially spliced events in SRRM2 KD HeLa.

Code
subset(tidy_dPSI_pval_HeLa) |>
  select(GENE, EVENT, dPSI_SRRM2, p_SRRM2_HeLa, padj_SRRM2_HeLa, DSE_SRRM2_HeLa) |>
  subset(DSE_SRRM2_HeLa) |>
  unique() -> tidy_signif_HeLa

up_tidy_HeLa <- subset(tidy_signif_HeLa, dPSI_SRRM2 >= 0)
do_tidy_HeLa <- subset(tidy_signif_HeLa, dPSI_SRRM2 < 0)

EX_up_tidy_HeLa <- grep(pattern = "^HsaEX", x = up_tidy_HeLa$EVENT, value = T)
EX_do_tidy_HeLa <- grep(pattern = "^HsaEX", x = do_tidy_HeLa$EVENT, value = T)

IN_up_tidy_HeLa <- grep(pattern = "^HsaINT", x = up_tidy_HeLa$EVENT, value = T)
IN_do_tidy_HeLa <- grep(pattern = "^HsaINT", x = do_tidy_HeLa$EVENT, value = T)

A3_up_tidy_HeLa <- grep(pattern = "^HsaALTA", x = up_tidy_HeLa$EVENT, value = T)
A3_do_tidy_HeLa <- grep(pattern = "^HsaALTA", x = do_tidy_HeLa$EVENT, value = T)

A3_tidy_HeLa <- match_alt35_id(up_ids = A3_up_tidy_HeLa, do_ids = A3_do_tidy_HeLa)

A5_up_tidy_HeLa <- grep(pattern = "^HsaALTD", x = up_tidy_HeLa$EVENT, value = T)
A5_do_tidy_HeLa <- grep(pattern = "^HsaALTD", x = do_tidy_HeLa$EVENT, value = T)

A5_tidy_HeLa <- match_alt35_id(up_ids = A5_up_tidy_HeLa, do_ids = A5_do_tidy_HeLa)

Print the number of differentially spliced events found with the tidy approach.

Code
message('HeLa SRRM2 KD tidy method:',
        '\nUP Exon: ', length(EX_up_tidy_HeLa),
        '\nDOWN Exon: ', length(EX_do_tidy_HeLa),
        '\nUP Intron: ', length(IN_up_tidy_HeLa),
        '\nDOWN Intron: ', length(IN_do_tidy_HeLa),
        "\nDifferentially spliced 3' splice sites: ", length(A3_tidy_HeLa),
        "\nDifferentially spliced 5' splice sites: ", length(A5_tidy_HeLa) )
HeLa SRRM2 KD tidy method:
UP Exon: 17
DOWN Exon: 36
UP Intron: 1
DOWN Intron: 2
Differentially spliced 3' splice sites: 7
Differentially spliced 5' splice sites: 4

The tidy method identifies less events, as it is stringent.

5.2.4 Testing for significance in HepG2 cells

Now repeat the same thing but for HepG2 cells

Code
mrgd_cntrl <- "HepG2_CNTRL"
mrgd_SRRM2 <- "HepG2_SRRM2"

CNTRLKD <- c("HepG2_CNTRL_A", "HepG2_CNTRL_B", "HepG2_CNTRL_C")
SRRM2KD <- c("HepG2_SRRM2_Aa", "HepG2_SRRM2_Ab",
             "HepG2_SRRM2_Ba", "HepG2_SRRM2_Bb",
             "HepG2_SRRM2_Ca", "HepG2_SRRM2_Cb")

Filter for samples that have finite ∆PSI (meaning that is not just NA in one of the mrgd samples) and that have maximum one sample per condition as NA (meaning at least 2 samples are not NA). This is step is done only on the SRRM2 KD in HeLa cells.

Code
te_HepG2_SRRM2 <- tidy_events_HepG2[grepl(pattern = '(CNTRL|SRRM2)', x = tidy_events_HepG2$Sample, perl = T), ]

Filter the data in the same way.

Code
te_HepG2_SRRM2 |>
  group_by(EVENT) |>
  summarise(mPSI_CNTRL = mean(PSI[Sample %in% mrgd_cntrl], na.rm = T),
            mPSI_SRRM2 = mean(PSI[Sample %in% mrgd_SRRM2], na.rm = T),
            
            Samples_NA_CNTRL = sum(is.na(PSI[Sample %in% CNTRLKD])),
            Samples_NA_SRRM2 = sum(is.na(PSI[Sample %in% SRRM2KD])) ) |>
  # calculate ∆PSI
  mutate(dPSI_SRRM2 = mPSI_SRRM2 - mPSI_CNTRL ) |>
  subset( is.finite(dPSI_SRRM2) ) |>
  # select samples with max 1 NA PSI value per event
  subset( Samples_NA_CNTRL <= 1) |>
  subset( Samples_NA_SRRM2 <= 1) |>
  select(EVENT, dPSI_SRRM2) |>
  unique() |>
  # Add the ∆PSI of the filtered events to the original data frame
  left_join(y = te_HepG2_SRRM2, by = join_by(EVENT)) |>
  relocate(GENE, .before = EVENT) |>
  relocate(starts_with('dPSI'), .after = PSI) |>
  # add metadata information
  left_join(y = mtdf, by = join_by(Sample)) |>
  # remove merged super sample PSIs
  subset( !Replicate == 'Merge') |>
  group_by(EVENT) |>
  # for the statistical test select only events that are changing a bit to enter the test
  # if not the data is essentially constant and statistical test fails 
  # this is calculated by measuring the PSI difference standard deviation that should not be zero
  mutate(dSD_SRRM2 = sd(PSI[Sample %in% CNTRLKD] - PSI[Sample %in% SRRM2KD],
                        na.rm = T) ) |>
  subset(dSD_SRRM2 != 0) |>
  select( !c(starts_with("dSD_"), PrettySample, KD, CellType)) |> # remove these columns
  # In addition one could also select the events that have a ∆PSI different from zero.
  # subset( abs(dPSI_SRRM2) >= 0.01 ) |>
  ungroup() -> tidy_dPSI_HepG2

message(length(unique(tidy_dPSI_HepG2$EVENT)), '/', 
        length(unique(tidy_events$EVENT)), ' (', 
        100*signif(length(unique(tidy_dPSI_HepG2$EVENT))/length(unique(tidy_events$EVENT)), 2), '%) ', 
        'filtered events that enter the statistical analysis')
13344/34490 (39%) filtered events that enter the statistical analysis

Calculate P values for SRRM2 KD in HepG2.

Code
tidy_dPSI_HepG2 |> 
  group_by(EVENT) |>
  t_test(formula = PSI ~ PrettyGroup, ref.group = 'HepG2_CNTRL', alternative = "two.sided", 
         paired = F, p.adjust.method = 'none', conf.level = 0.95, detailed = F) |>
  adjust_pvalue(method = "BH") |> add_significance("p.adj") |>
  select(EVENT, p, p.adj, p.adj.signif) |>
  setNames(c('EVENT', 'p_SRRM2_HepG2', 'padj_SRRM2_HepG2', 'padj_signif_SRRM2_HepG2')) -> pvals_SRRM2_HepG2

Combine all data frames into one. If an event as a P value that is NA, discard it.

Code
left_join(x = tidy_dPSI_HepG2, y = pvals_SRRM2_HepG2, by = join_by(EVENT)) |>
  subset(!is.na(p_SRRM2_HepG2)) |>
  subset(!is.na(padj_SRRM2_HepG2)) -> tidy_dPSI_pval_HepG2

Check the P value distribution in HepG2 cells.

Code
tidy_dPSI_pval_HepG2 |>
  select(EVENT, p_SRRM2_HepG2) |>
  pivot_longer(cols = starts_with("p")) |>
  ggplot() +
  aes(x = value, fill = name) +
  geom_histogram(bins = 25, colour = 'black', show.legend = F) + 
  labs(x = "P values") + theme_bw() 

Code
dPSI_squish <- 40

tidy_dPSI_pval_HepG2 <- tidy_dPSI_pval_HepG2 |>
  # label the events that are below significant thresholds
  mutate(DSE_SRRM2_HepG2 = p_SRRM2_HepG2 < pval_thrshld & abs(dPSI_SRRM2) >= dPSI_thrshld ) 

Volcano plot of SRRM2 KD. Points with |∆PSI| > 40 are squished on the sides of the plot.

Code
# select fewer columns for plotting
subset(tidy_dPSI_pval_HepG2) |>
  select(GENE, EVENT, dPSI_SRRM2, p_SRRM2_HepG2, padj_SRRM2_HepG2, DSE_SRRM2_HepG2) |>
  unique() -> plot_HepG2

ggplot(plot_HepG2) +
  aes(x = dPSI_SRRM2, y = -log10(p_SRRM2_HepG2), fill = DSE_SRRM2_HepG2) +
  geom_hline(yintercept = -log10(pval_thrshld), linewidth = 0.1, colour = 'black') +
  geom_vline(xintercept = -dPSI_thrshld, linewidth = 0.1, colour = 'black') +
  geom_vline(xintercept = dPSI_thrshld, linewidth = 0.1, colour = 'black') +
  geom_text_repel(data = subset(plot_HepG2, DSE_SRRM2_HepG2),
                  aes(label = GENE, x = dPSI_SRRM2, y = -log10(p_SRRM2_HepG2)),
                      seed = 16, show.legend = F, segment.curvature = -1e-20,
                      family = "Arial", size = 2, nudge_x = -0.25,
                      segment.color = 'black', verbose = F, lwd = 0.1,
                      box.padding = grid::unit(1, "mm"),
                      point.padding = grid::unit(0.55, "mm"),
                      max.overlaps = 20 ) +
  geom_point(shape = 21, show.legend = F, alpha = 1, stroke = 0.2) +
  scale_fill_manual(values = c('gray84', 'firebrick3') ) +
  scale_x_continuous(n.breaks = 10, oob = scales::squish, limits = c(-dPSI_squish, dPSI_squish),
                     expand = expansion(mult = 0, add = 0.5)) +
  scale_y_continuous(expand = expansion(mult = c(0.01, 0.05), add = 0) ) +
  coord_cartesian(ylim = c(0, 6.05)) + # set this to have the 2 volcano plots with the same Y-axis range
  labs( x = "\u0394PSI (SRRM2 KD - Cntrl)", y = expression(-log[10] ~ "P value") ) +
  theme_classic(base_family = 'Arial') +
  theme(axis.text = element_text(colour = 'black'),
        axis.line = element_line(colour = 'black', linewidth = 0.2),
        panel.grid.major = element_line(linewidth = 0.1),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        plot.background = element_blank())

Code
ggsave(filename = '15_tidy_SRRM2_Volcano_HepG2.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 10,
       height = 12)

Select the differentially spliced events in SRRM2 KD HepG2 cells.

Code
subset(tidy_dPSI_pval_HepG2) |>
  select(GENE, EVENT, DSE_SRRM2_HepG2, dPSI_SRRM2, p_SRRM2_HepG2, padj_SRRM2_HepG2, DSE_SRRM2_HepG2) |>
  subset(DSE_SRRM2_HepG2) |>
  unique() -> tidy_signif_HepG2

up_tidy_HepG2 <- subset(tidy_signif_HepG2, dPSI_SRRM2 >= 0)
do_tidy_HepG2 <- subset(tidy_signif_HepG2, dPSI_SRRM2 < 0)

EX_up_tidy_HepG2 <- grep(pattern = "^HsaEX", x = up_tidy_HepG2$EVENT, value = T)
EX_do_tidy_HepG2 <- grep(pattern = "^HsaEX", x = do_tidy_HepG2$EVENT, value = T)

IN_up_tidy_HepG2 <- grep(pattern = "^HsaINT", x = up_tidy_HepG2$EVENT, value = T)
IN_do_tidy_HepG2 <- grep(pattern = "^HsaINT", x = do_tidy_HepG2$EVENT, value = T)

A3_up_tidy_HepG2 <- grep(pattern = "^HsaALTA", x = up_tidy_HepG2$EVENT, value = T)
A3_do_tidy_HepG2 <- grep(pattern = "^HsaALTA", x = do_tidy_HepG2$EVENT, value = T)

A3_tidy_HepG2 <- match_alt35_id(up_ids = A3_up_tidy_HepG2, do_ids = A3_do_tidy_HepG2)

A5_up_tidy_HepG2 <- grep(pattern = "^HsaALTD", x = up_tidy_HepG2$EVENT, value = T)
A5_do_tidy_HepG2 <- grep(pattern = "^HsaALTD", x = do_tidy_HepG2$EVENT, value = T)

A5_tidy_HepG2 <- match_alt35_id(up_ids = A5_up_tidy_HepG2, do_ids = A5_do_tidy_HepG2)

Print the number of DSE found with the tidy method.

Code
message('HepG2 SRRM2 KD tidy method:',
        '\nUP Exon: ', length(EX_up_tidy_HepG2),
        '\nDOWN Exon: ', length(EX_do_tidy_HepG2),
        '\nUP Intron: ', length(IN_up_tidy_HepG2),
        '\nDOWN Intron: ', length(IN_do_tidy_HepG2),
        "\nDifferentially spliced 3' splice sites: ", length(A3_tidy_HepG2),
        "\nDifferentially spliced 5' splice sites: ", length(A5_tidy_HepG2) )
HepG2 SRRM2 KD tidy method:
UP Exon: 114
DOWN Exon: 38
UP Intron: 1
DOWN Intron: 114
Differentially spliced 3' splice sites: 20
Differentially spliced 5' splice sites: 10

5.3 Bayesian method (from diff)

Import the vast-tools diff tables. Note: diff outputs PSIs from 0 to 1, so I multiply them by 100.

This tables were create with this command line short cut:

Code
tail -n +2 Diff_output_all_events_noB3_pIR.tab | \
      awk '{OFS="\t"} { if($6 >= 0.15) print $1, $2, $5, $6}' | \
      sort -k3,4nr  | sed -e $'1iGENE\tEVENT\tdPSI\tMVdPSIat95' > Fltrd

Which filters for events with a minimum value of |∆PSI| >= 15 with a 95% confidence (column number 6) and sorts them by observed ∆PSI (column number 5).

Code
diff_HeLa <- read_vst_tbl(path = diff_HeLa_SRRM2_path, show_col_types = FALSE) |>
  mutate(dPSI = dPSI * 100) |>
  mutate(MVdPSIat95 = MVdPSIat95 * 100) |>
  mutate(COMPLEX = case_when( grepl(x = EVENT, pattern = '^HsaEX') ~ 'S',
                                grepl(x = EVENT, pattern = '^HsaINT') ~ 'IR',
                                grepl(x = EVENT, pattern = '^HsaALTD') ~ 'Alt5',
                                grepl(x = EVENT, pattern = '^HsaALTA') ~ 'Alt3') ) |>
  mutate(comparison = 'HeLa')

diff_HepG2 <- read_vst_tbl(path = diff_HepG2_SRRM2_path, show_col_types = FALSE) |>
  mutate(dPSI = dPSI * 100) |>
  mutate(MVdPSIat95 = MVdPSIat95 * 100) |>
  mutate(COMPLEX = case_when( grepl(x = EVENT, pattern = '^HsaEX') ~ 'S',
                                grepl(x = EVENT, pattern = '^HsaINT') ~ 'IR',
                                grepl(x = EVENT, pattern = '^HsaALTD') ~ 'Alt5',
                                grepl(x = EVENT, pattern = '^HsaALTA') ~ 'Alt3') ) |>
  mutate(comparison = 'HepG2')

How many AS events?

Code
diff_events_HeLa <- unique(diff_HeLa$EVENT)
message(length(diff_events_HeLa), ' diff events in HeLa upon SRRM2 KD')
188 diff events in HeLa upon SRRM2 KD
Code
diff_events_HepG2 <- unique(diff_HepG2$EVENT)
message(length(diff_events_HepG2), ' diff events in HepG2 upon SRRM2 KD')
924 diff events in HepG2 upon SRRM2 KD
Code
diff_events_common <- intersect(diff_events_HeLa, diff_events_HepG2)
message(length(diff_events_common), ' diff events common between both KD')
3 diff events common between both KD
Code
# make a list with the DSE of each KD
diff_events_HeLa_only <- diff_events_HeLa[( !diff_events_HeLa %in% diff_events_common)]
diff_events_HepG2_only <- diff_events_HepG2[( !diff_events_HepG2 %in% diff_events_common)]

Interestingly not many events are common between the 2 KD.

Plot the actual ∆PSI of the events that were filtered to have a minimum value of |∆PSI| >=15 with a 95% confidence.

Code
plot_dPSI_hist_grpd(diff_HeLa) +
  geom_vline(xintercept = -15) + geom_vline(xintercept = 15) 

Code
ggsave(filename = '16_diff_dPSI_SRRM2-KD_HeLa_Histogram.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 12,
       height = 6)

There seem to be a good balance between up and down regulated events.

Code
plot_dPSI_hist_grpd(diff_HepG2) +
  geom_vline(xintercept = -15) +   geom_vline(xintercept = 15) 

Code
ggsave(filename = '17_diff_dPSI_SRRM2-KD_HepG2_Histogram.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 12,
       height = 6)

These seems to be more down regulated AS events in HepG2 SRRM2 KD.

The diff table does not contain all the information about the AS event as the main full inclusion table so I get such info from the full inclusion table.

Code
fetch_diff_IDs <- unique(c(diff_events_HeLa, diff_events_HepG2))
message(length(fetch_diff_IDs), ' events to get info... this may take some time')
1109 events to get info... this may take some time

This next step might take some time to process.

Code
grep_psi(inclusion_tbl = psi_path, vst_id = fetch_diff_IDs) |>
  tidy_vst_psi() -> diff_psi_df

To avoid having to repeat this step in the future I save this dataframe to file and not execute the code chunk above any more.

Code
write_delim(x = diff_psi_df, delim = '\t',
            file = file.path(diff_dir, 'Fltrd_Annotated_Diff_Events_HeLa_Both_OE.tab') ) 

Import the table that was written the first time.

Code
diff_psi_df <- read_vst_tbl(show_col_types = FALSE,
  path = file.path(diff_dir, 'Fltrd_Annotated_Diff_Events_KD.tab') ) |>
  left_join(y = mtdf, by = join_by(Sample))

Separate between the exons and the introns that are up or down spliced in HeLa SRRM2 KD

Code
up_diff_HeLa <- subset(diff_HeLa, dPSI >= 0)
do_diff_HeLa <- subset(diff_HeLa, dPSI < 0)

EX_up_diff_HeLa <- grep(pattern = "^HsaEX", x = up_diff_HeLa$EVENT, value = T)
EX_do_diff_HeLa <- grep(pattern = "^HsaEX", x = do_diff_HeLa$EVENT, value = T)

IN_up_diff_HeLa <- grep(pattern = "^HsaINT", x = up_diff_HeLa$EVENT, value = T)
IN_do_diff_HeLa <- grep(pattern = "^HsaINT", x = do_diff_HeLa$EVENT, value = T)

A3_up_diff_HeLa <- grep(pattern = "^HsaALTA", x = up_diff_HeLa$EVENT, value = T)
A3_do_diff_HeLa <- grep(pattern = "^HsaALTA", x = do_diff_HeLa$EVENT, value = T)

A3_diff_HeLa <- match_alt35_id(up_ids = A3_up_tidy_HeLa, do_ids = A3_do_tidy_HeLa)

A5_up_diff_HeLa <- grep(pattern = "^HsaALTD", x = up_diff_HeLa$EVENT, value = T)
A5_do_diff_HeLa <- grep(pattern = "^HsaALTD", x = do_diff_HeLa$EVENT, value = T)

A5_diff_HeLa <- match_alt35_id(up_ids = A5_up_diff_HeLa, do_ids = A5_do_diff_HeLa)

Print the number.

Code
message('HeLa SRRM2 KD diff method:',
        '\nUP Exon: ', length(EX_up_diff_HeLa),
        '\nDOWN Exon: ', length(EX_do_diff_HeLa),
        '\nUP Intron: ', length(IN_up_diff_HeLa),
        '\nDOWN Intron: ', length(IN_do_diff_HeLa),
        "\nDifferentially spliced 3' splice sites: ", length(A3_diff_HeLa),
        "\nDifferentially spliced 5' splice sites: ", length(A5_diff_HeLa) )
HeLa SRRM2 KD diff method:
UP Exon: 39
DOWN Exon: 33
UP Intron: 42
DOWN Intron: 26
Differentially spliced 3' splice sites: 7
Differentially spliced 5' splice sites: 25

Separate between the exons and the introns that are up or down spliced in HepG2 SRRM2 KD

Code
up_diff_HepG2 <- subset(diff_HepG2, dPSI >= 0)
do_diff_HepG2 <- subset(diff_HepG2, dPSI < 0)

EX_up_diff_HepG2 <- grep(pattern = "^HsaEX", x = up_diff_HepG2$EVENT, value = T)
EX_do_diff_HepG2 <- grep(pattern = "^HsaEX", x = do_diff_HepG2$EVENT, value = T)

IN_up_diff_HepG2 <- grep(pattern = "^HsaINT", x = up_diff_HepG2$EVENT, value = T)
IN_do_diff_HepG2 <- grep(pattern = "^HsaINT", x = do_diff_HepG2$EVENT, value = T)

A3_up_diff_HepG2 <- grep(pattern = "^HsaALTA", x = do_diff_HepG2$EVENT, value = T)
A3_do_diff_HepG2 <- grep(pattern = "^HsaALTA", x = do_diff_HepG2$EVENT, value = T)

A3_diff_HepG2 <- match_alt35_id(up_ids = A3_up_diff_HepG2, do_ids = A3_do_diff_HepG2)

A5_up_diff_HepG2 <- grep(pattern = "^HsaALTD", x = do_diff_HepG2$EVENT, value = T)
A5_do_diff_HepG2 <- grep(pattern = "^HsaALTD", x = do_diff_HepG2$EVENT, value = T)

A5_diff_HepG2 <- match_alt35_id(up_ids = A5_up_diff_HepG2, do_ids = A5_do_diff_HepG2)

Print the number of DSE found with the diff method.

Code
message('HepG2 SRRM2 KD diff method:',
        '\nUP Exon: ', length(EX_up_diff_HepG2),
        '\nDOWN Exon: ', length(EX_do_diff_HepG2),
        '\nUP Intron: ', length(IN_up_diff_HepG2),
        '\nDOWN Intron: ', length(IN_do_diff_HepG2),
        "\nDifferentially spliced 3' splice sites: ", length(A3_diff_HepG2),
        "\nDifferentially spliced 5' splice sites: ", length(A5_diff_HepG2))
HepG2 SRRM2 KD diff method:
UP Exon: 64
DOWN Exon: 133
UP Intron: 63
DOWN Intron: 408
Differentially spliced 3' splice sites: 89
Differentially spliced 5' splice sites: 90

6 SRRM2 expression

Checking the SRRM2 expression across the dataset.

Code
grep_gene_expression(vst_expression_tbl = expr_path, 
                     ensembl_gene_id = 'ENSG00000167978') |>
  tidy_vst_expr(expression_unit = 'TPM' ) |>
  left_join(y = mtdf, by = join_by(Sample) ) |>
  ggplot() +
  aes(x = PrettyGroup, y = log2(Gene_Expr)) +
  geom_boxplot(fill = NA, outlier.shape = NA, lwd = 0.2) +
  geom_point( aes(shape = Replicate, fill = CellType), show.legend = FALSE,
              size = 1.5, stroke = 0.2, position = position_dodge(width = 0.5)) +
  scale_shape_manual(values = c('Merge' = 22, 'A' = 21, 'Aa' = 21, 'Ab' = 21,
                                'B' = 21, 'Ba' = 21, 'Bb' = 21, 
                                'C' = 21, 'Ca' = 21, 'Cb' = 21 ) ) +
  scale_x_discrete(labels = \(x) gsub(pattern = '_', replacement = '\n', x) ) +
  scale_fill_manual(values = sample_palette) +
  coord_cartesian(ylim = c(5, 9)) +
  labs(y = 'Norm. TPMs (log2)') +
  guides(fill = guide_legend(override.aes = list(shape = 21)), shape = 'none') +
  theme_classic() +
    theme(axis.text = element_text(colour = "black"),
          axis.line = element_line(linewidth = 0.2),
          axis.title.x = element_blank(),
          strip.background = element_blank(),
          strip.text = element_text(vjust = 1, lineheight = 0, 
                                    margin = margin(t = 0, unit = 'mm')),
          panel.grid.major = element_line(colour = 'gray84', linewidth = 0.15) ) 

Code
ggsave(filename = '0_SRRM2_Expression_HeLa_HepG2_Boxplot.pdf', plot = last_plot(), 
       device = cairo_pdf, path = plot_dir, units = 'cm', width = 16,
       height = 8)

The square point is the merged replicates into a super sample. The KD seems to work.

7 Overlap of differentially spliced events

Note

Plots should be improved.

7.1 Exons

Code
EX_up_HeLa <- unique(c(EX_up_cmpr_HeLa, EX_up_tidy_HeLa, EX_up_diff_HeLa))
EX_do_HeLa <- unique(c(EX_do_cmpr_HeLa, EX_do_tidy_HeLa, EX_do_diff_HeLa))

EX_up_HepG2 <- unique(c(EX_up_cmpr_HepG2, EX_up_tidy_HepG2, EX_up_diff_HepG2))
EX_do_HepG2 <- unique(c(EX_do_cmpr_HepG2, EX_do_tidy_HepG2, EX_do_diff_HepG2))

dsexons_list <- list(EX_up_HeLa, EX_do_HeLa, EX_up_HepG2, EX_do_HepG2)
 
names(dsexons_list) <- c('EX UP HeLa', 'EX DOWN HeLa', 'EX UP HepG2', 'EX DOWN HepG2')
Code
intersection_exons_ids <- upset(fromList(dsexons_list), text.scale = 1.5, 
                                nintersects = 6, 
                                mb.ratio = c(0.7, 0.3),
                                point.size = 5, order.by = 'freq', keep.order = T,
                                sets = rev(names(dsexons_list)),
                                main.bar.color = "gray16") 

Show the number and overlap of events found by each method with an upset plot:

Code
intersection_exons_ids

Save to pdf.

Code
cairo_pdf(file = file.path(plot_dir, '19_overlap_diff_exons_Upset_HeLa_HepG2.pdf'), 
          width = 5, height = 4.5, bg = NA)
intersection_exons_ids
dev.off()
png 
  2 

7.2 Introns

Code
IN_up_HeLa <- unique(c(IN_up_cmpr_HeLa, IN_up_tidy_HeLa, IN_up_diff_HeLa))
IN_do_HeLa <- unique(c(IN_do_cmpr_HeLa, IN_do_tidy_HeLa, IN_do_diff_HeLa))

IN_up_HepG2 <- unique(c(IN_up_cmpr_HepG2, IN_up_tidy_HepG2, IN_up_diff_HepG2))
IN_do_HepG2 <- unique(c(IN_do_cmpr_HepG2, IN_do_tidy_HepG2, IN_do_diff_HepG2))
 
dsintrons_list <- list(IN_up_HeLa, IN_do_HeLa, IN_up_HepG2, IN_do_HepG2)
names(dsintrons_list) <- c('INT UP HeLa', 'INT DOWN HeLa', 'INT UP HepG2', 'INT DOWN HepG2')

Make upset plot introns

Code
intersection_introns_ids <- upset(fromList(dsintrons_list), text.scale = 1.5, nintersects = NA, 
                                  mb.ratio = c(0.7, 0.3), 
                                  point.size = 5, order.by = 'freq', keep.order = T,
                                  sets = rev(names(dsintrons_list)),
                                  main.bar.color = "gray16") 

Show the number and overlap of events found by each method with an upset plot:

Code
intersection_introns_ids

Code
cairo_pdf(file = file.path(plot_dir, '20_overlap_diff_introns_Upset.pdf'), 
          width = 6, height = 4.5, bg = NA)
intersection_introns_ids
dev.off()
png 
  2 
Warning

There is very little overlap between the events that are differentially spliced in HeLa or HepG2 cells upon SRRM2 knockdown!

7.2.1 Alt 3’ or 5’ splice site

Code
A3_HeLa <- unique(c(A3_cmpr_HeLa, A3_tidy_HeLa, A3_diff_HeLa))
A5_HeLa <- unique(c(A5_cmpr_HeLa, A5_tidy_HeLa, A5_diff_HeLa))

A3_HepG2 <- unique(c(A3_cmpr_HepG2, A3_tidy_HepG2, A3_diff_HepG2))
A5_HepG2 <- unique(c(A5_cmpr_HepG2, A5_tidy_HepG2, A5_diff_HepG2))

dsaltss_list <- list(A3_HeLa, A3_HepG2, A5_HeLa, A5_HepG2)
names(dsaltss_list) <- c('ALT 3SS HeLa', 'ALT 3SS HepG2', 'ALT 5SS HeLa', 'ALT 5SS HepG2')

Make upset plot alternative splice sites

Code
intersection_ss_ids <- upset(fromList(dsaltss_list), text.scale = 1.5, nintersects = NA, 
                                  mb.ratio = c(0.7, 0.3), 
                                  point.size = 5, order.by = 'freq', keep.order = T,
                                  sets = rev(names(dsaltss_list)),
                                  main.bar.color = "gray16") 

Show the number and overlap of events found by each method with an upset plot:

Code
intersection_ss_ids

Code
cairo_pdf(file = file.path(plot_dir, '21_overlap_alt_splice_sites_Upset.pdf'), 
          width = 6.1, height = 5, bg = NA)
intersection_introns_ids
dev.off()
png 
  2 

7.3 By method

Plot how many differentially Spliced Events by method? This code doesn’t separate between exons or introns but only between up or down-regulated events.

Check the number of DSE found by each method.

Code
dse_list_HeLa <- list(up_cmpr_HeLa$EVENT, do_cmpr_HeLa$EVENT,
                      up_tidy_HeLa$EVENT, do_tidy_HeLa$EVENT,
                      up_diff_HeLa$EVENT, do_diff_HeLa$EVENT )
 
names(dse_list_HeLa) <- c('UP HeLa CMPR', 'DOWN HeLa CMPR',
                          'UP HeLa TIDY', 'DOWN HeLa TIDY',
                          'UP HeLa DIFF', 'DOWN HeLa DIFF' )

dse_list_HepG2 <- list(up_cmpr_HepG2$EVENT, do_cmpr_HepG2$EVENT,
                       up_tidy_HepG2$EVENT, do_tidy_HepG2$EVENT,
                       up_diff_HepG2$EVENT, do_diff_HepG2$EVENT)

names(dse_list_HepG2) <- c('UP HepG2 CMPR', 'DOWN HepG2 CMPR',
                          'UP HepG2 TIDY', 'DOWN HepG2 TIDY',
                          'UP HepG2 DIFF', 'DOWN HepG2 DIFF')
Code
intersection_methods_HeLa <- upset(fromList(dse_list_HeLa), nsets = length(dse_list_HeLa),
                                   text.scale = 1.5, nintersects = NA, 
                                   mb.ratio = c(0.6, 0.4), decreasing = c(T, F),
                                   order.by = 'freq', main.bar.color = "gray16") 

Show the number and overlap of events found by each method with an upset plot:

Code
intersection_methods_HeLa

Code
cairo_pdf(file = file.path(plot_dir, '22_overlap_HeLa_methods_ex_introns_Upset.pdf'), 
          width = 6, height = 4, bg = NA)
intersection_methods_HeLa
dev.off()
png 
  2 

Extract the HeLa bona fide (found by 3 different methods) UP and DOWN events

Code
dse_list_HeLa_UP <- list(up_cmpr_HeLa$EVENT, up_tidy_HeLa$EVENT,
                         up_diff_HeLa$EVENT)

# find events that are more included in all 3 methods comparisons
BF_events_HeLa_UP <- Reduce(intersect, lapply(dse_list_HeLa_UP, function(x) x ) )

message("Num. events diff. upregulated in all 3 methods in HeLa: ", length(BF_events_HeLa_UP),
        "\nEvent: ", BF_events_HeLa_UP)
Num. events diff. upregulated in all 3 methods in HeLa: 1
Event: HsaEX0065944
Code
dse_list_HeLa_DO <- list(do_cmpr_HeLa$EVENT, do_tidy_HeLa$EVENT, do_diff_HeLa$EVENT)

# find events that are more skipped in all 3 methods comparisons
BF_events_HeLa_DO <- Reduce(intersect, lapply(dse_list_HeLa_DO, function(x) x ) )

message("Num. events diff. downregulated in all 3 methods in HeLa: ", length(BF_events_HeLa_DO),
        "\nEvents: ", paste0(BF_events_HeLa_DO, collapse = ", ") )
Num. events diff. downregulated in all 3 methods in HeLa: 6
Events: HsaEX0035105, HsaEX0066187, HsaEX0062599, HsaEX0026508, HsaEX0026509, HsaEX0035021

Now for HepG2

Code
intersection_methods_HepG2 <- upset(fromList(dse_list_HepG2), nsets = length(dse_list_HepG2),
                                   text.scale = 1.5, nintersects = NA, 
                                   mb.ratio = c(0.6, 0.4), decreasing = c(T, F),
                                   order.by = 'freq', main.bar.color = "gray16") 

Show the number and overlap of events found by each method with an upset plot:

Code
intersection_methods_HepG2

Save to pdf the HepG2 plot.

Code
cairo_pdf(file = file.path(plot_dir, '22_overlap_HepG2_methods_ex_introns_Upset.pdf'), 
          width = 6, height = 4, bg = NA)
intersection_methods_HepG2
dev.off()
png 
  2 

Extract the HepG2 bona fide (found by 3 different methods) UP and DOWN events

Code
dse_list_HepG2_UP <- list(up_cmpr_HepG2$EVENT, up_tidy_HepG2$EVENT,up_diff_HepG2$EVENT)

# find events that are more included in all 3 methods comparisons
BF_events_HepG2_UP <- Reduce(intersect, lapply(dse_list_HepG2_UP, function(x) x ) )

message("Num. events diff. upregulated in all 3 methods in HepG2: ", length(BF_events_HepG2_UP),
        "\nEvent: ", paste0(BF_events_HepG2_UP, collapse = ", ") )
Num. events diff. upregulated in all 3 methods in HepG2: 5
Event: HsaEX0041240, HsaEX0066371, HsaEX0037169, HsaEX0028178, HsaEX0028282
Code
dse_list_HepG2_DO <- list(do_cmpr_HepG2$EVENT, do_tidy_HepG2$EVENT, do_diff_HepG2$EVENT)

# find events that are more skipped in all 3 methods comparisons
BF_events_HepG2_DO <- Reduce(intersect, lapply(dse_list_HepG2_DO, function(x) x ) )
message("Num. events diff. downregulated in all 3 methods in HepG2: ", length(dse_list_HepG2_DO) )
Num. events diff. downregulated in all 3 methods in HepG2: 3

Select the best bona fide events that are up or down regulated in HeLa and HepG2 cells.

Code
Best_SRRMS_events_IDs <- c(BF_events_HeLa_UP, BF_events_HeLa_DO, BF_events_HepG2_UP, BF_events_HepG2_DO) |> unique()
length(Best_SRRMS_events_IDs)
[1] 70
Code
Best_SRRMS_events_IDs_df <- data.frame(DSE_robust_HeLa_HepG2 = sort(Best_SRRMS_events_IDs))

Save these exons and introns to file in case someone will need them in the future

Code
write_delim(x = Best_SRRMS_events_IDs_df, col_names = T, append = F, 
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, 'Bona_Fide_Diff_Spliced_Exons_Introns_SRRM2-KD_HeLa_HepG2.tab'),
            delim = '\t')
Note

In conclusion when analysing SRRM2 knockdown RNA-seq in HeLa and HepG2 from 2 different publications and using 3 different filtering methods to call for differentially spliced exons and introns, I found that only 70 exons and introns are reliably detected in both experiments in all 3 analysis methods.

8 Export DSE and data

Note

Export the significant events and look for features using Matt.

8.1 Exons

Calculate the intersection of up or down regulated exons between HeLa and HepG2 cells as well as the uniquely mis-spliced exons in each cell line.

Code
EX_up_common_HH <- intersect( EX_up_HeLa,  EX_up_HepG2 )
EX_do_common_HH <- intersect( EX_up_HepG2, EX_do_HepG2 )

EX_up_uniq_HeLa <- EX_up_HeLa[ !(EX_up_HeLa %in% c(EX_up_common_HH, EX_up_HepG2) )]
EX_do_uniq_HeLa <- EX_do_HeLa[ !(EX_do_HeLa %in% c(EX_do_common_HH, EX_do_HepG2) )] 

EX_up_uniq_HepG2 <- EX_up_HepG2[ !(EX_up_HepG2 %in% c(EX_up_common_HH, EX_up_HeLa) )]

EX_do_uniq_HepG2 <- EX_do_HepG2[ !(EX_do_HepG2 %in% c(EX_do_common_HH, EX_up_HeLa) )] 

Export to file the differentially spliced exons IDs

Code
write_delim(x = as.data.frame(EX_up_common_HH), col_names = F, append = F, 
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '1_UP_EXONS_BOTH.tab'), delim = '\t')

write_delim(x = as.data.frame(EX_do_common_HH), col_names = F, append = F, 
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '2_DOWN_EXONS_BOTH.tab'), delim = '\t')

write_delim(x = as.data.frame(EX_up_uniq_HeLa), col_names = F, append = F, 
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '3_UP_EXONS_HeLa.tab'), delim = '\t')

write_delim(x = as.data.frame(EX_do_uniq_HeLa), col_names = F, append = F,
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '4_DOWN_EXONS_HeLa.tab'), delim = '\t')

write_delim(x = as.data.frame(EX_up_uniq_HepG2), col_names = F, append = F,
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '5_UP_EXONS_HepG2.tab'), delim = '\t')

write_delim(x = as.data.frame(EX_do_uniq_HepG2), col_names = F, append = F,
            quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '6_DOWN_EXONS_HepG2.tab'), delim = '\t')

8.2 Introns

Calculate the intersection of up or down regulated introns between HeLa and HepG2 cells as well as the uniquely mis-spliced introns in each cell line.

Code
IN_up_common_HH <- intersect(IN_up_HeLa, IN_up_HepG2)
IN_do_common_HH <- intersect(IN_do_HeLa, IN_do_HepG2)

IN_up_uniq_HeLa <- IN_up_HeLa[ !(IN_up_HeLa %in% c(IN_up_common_HH, IN_up_HepG2)) ]
IN_do_uniq_HeLa <- IN_do_HeLa[ !(IN_do_HeLa %in% c(IN_do_common_HH, IN_do_HepG2)) ] 

IN_up_uniq_HepG2 <- IN_up_HepG2[ !(IN_up_HepG2 %in% c(IN_up_common_HH, IN_up_HeLa)) ]
IN_do_uniq_HepG2 <- IN_do_HepG2[ !(IN_do_HepG2 %in% c(IN_do_common_HH, IN_do_HeLa)) ] 

Export to file the differentially spliced introns IDs

Code
write_delim(x = as.data.frame(IN_up_common_HH), col_names = F, append = F, 
            quote = 'none', escape = 'none', delim = '\t',
            file = file.path(dse_IDs_dir, '1_UP_INTRONS_BOTH.tab') )

write_delim(x = as.data.frame(IN_do_common_HH), col_names = F, append = F,
            quote = 'none', escape = 'none', delim = '\t',
            file = file.path(dse_IDs_dir, '2_DOWN_INTRONS_BOTH.tab') )

write_delim(x = as.data.frame(IN_up_uniq_HeLa), col_names = F, append = F,
            quote = 'none', escape = 'none', delim = '\t',
            file = file.path(dse_IDs_dir, '3_UP_INTRONS_HeLa.tab') )

write_delim(x = as.data.frame(IN_do_uniq_HeLa), col_names = F, append = F,
            quote = 'none', escape = 'none', delim = '\t',
            file = file.path(dse_IDs_dir, '4_DOWN_INTRONS_HeLa.tab') )

write_delim(x = as.data.frame(IN_up_uniq_HepG2), col_names = F, append = F,
            quote = 'none', escape = 'none', delim = '\t',
            file = file.path(dse_IDs_dir, '5_UP_INTRONS_HepG2.tab') )

write_delim(x = as.data.frame(IN_do_uniq_HepG2), col_names = F, append = F,
            quote = 'none', escape = 'none', delim = '\t',
            file = file.path(dse_IDs_dir, '6_DOWN_INTRONS_HepG2.tab'))

8.3 Alternative 3’ and 5’ splice sites

Code
A3_common <- A3_HeLa[(A3_HeLa %in% A3_HepG2)] 
A5_common <- A5_HeLa[(A5_HeLa %in% A5_HepG2)] 

A3_uniqHeLa <- A3_HeLa[!A3_HeLa %in% A3_HepG2]
stopifnot(length(A3_uniqHeLa) + length(A3_common) == length(A3_HeLa))

A3_uniqHepG2 <- A3_HepG2[!(A3_HepG2 %in% A3_HeLa)]
stopifnot(length(A3_uniqHepG2) + length(A3_common) == length(A3_HepG2))

A5_uniqHeLa <- A5_HeLa[!A5_HeLa %in% A5_HepG2]
stopifnot(length(A5_uniqHeLa) + length(A5_common) == length(A5_HeLa))

A5_uniqHepG2 <- A5_HepG2[!(A5_HepG2 %in% A5_HeLa)]
stopifnot(length(A5_uniqHepG2) + length(A5_common) == length(A5_HepG2))

Export to file the differentially spliced splice sites IDs

Code
write_delim(x = as.data.frame(A3_common), col_names = F, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '1_ALT_ACCEPTOR_SPLICE_SITES_BOTH.tab'), delim = '\t')

write_delim(x = as.data.frame(A5_common), col_names = F, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '2_ALT_DONOR_SPLICE_SITES_BOTH.tab'), delim = '\t')

write_delim(x = as.data.frame(A3_uniqHeLa), col_names = F, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '3_ALT_ACCEPTOR_SPLICE_SITES_HeLa.tab'), delim = '\t')

write_delim(x = as.data.frame(A3_uniqHepG2), col_names = F, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '4_ALT_ACCEPTOR_SPLICE_SITES_HepG2.tab'), delim = '\t')

write_delim(x = as.data.frame(A5_uniqHeLa), col_names = F, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '5_ALT_DONOR_SPLICE_SITES_HeLa.tab'), delim = '\t')

write_delim(x = as.data.frame(A5_uniqHepG2), col_names = F, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '6_ALT_DONOR_SPLICE_SITES_HepG2.tab'), delim = '\t')

8.4 Combine all into one dataframe

Code
rbind(
  data.frame(EXPERIMENT = 'HeLa-SRRM2-KD',
             TYPE = "Exon",
             EVENT = c(EX_up_common_HH, EX_up_uniq_HeLa),
             DIRECTION = 'UP'),
  
  data.frame(EXPERIMENT = 'HepG2-SRRM2-KD',
             TYPE = "Exon",
             EVENT = c(EX_up_common_HH, EX_up_uniq_HepG2),
             DIRECTION = 'UP'),
  
  data.frame(EXPERIMENT = 'HeLa-SRRM2-KD',
             TYPE = "Exon",
             EVENT = c(EX_do_common_HH, EX_do_uniq_HeLa),
             DIRECTION = 'DOWN'),
  
  data.frame(EXPERIMENT = 'HepG2-SRRM2-KD',
             TYPE = "Exon",
             EVENT = c(EX_do_common_HH, EX_do_uniq_HepG2),
             DIRECTION = 'DOWN')
) -> exons_IDs

rbind(
  data.frame(EXPERIMENT = 'HeLa-SRRM2-KD',
             TYPE = "Intron",
             EVENT = c(IN_up_common_HH, IN_up_uniq_HeLa),
             DIRECTION = 'UP'),
  
  data.frame(EXPERIMENT = 'HepG2-SRRM2-KD',
             TYPE = "Intron",
             EVENT = c(IN_up_common_HH, IN_up_uniq_HepG2),
             DIRECTION = 'UP'),
  
  data.frame(EXPERIMENT = 'HeLa-SRRM2-KD',
             TYPE = "Intron",
             EVENT = c(IN_do_common_HH, IN_do_uniq_HeLa),
             DIRECTION = 'DOWN'),
  
  data.frame(EXPERIMENT = 'HepG2-SRRM2-KD',
             TYPE = "Intron",
             EVENT = c(IN_do_common_HH, IN_do_uniq_HepG2),
             DIRECTION = 'DOWN')
) -> introns_IDs

rbind(
  data.frame(EXPERIMENT = 'HeLa-SRRM2-KD',
             TYPE = "Alt. 3' ss",
             EVENT = c(A3_common, A3_uniqHeLa)),
  
  data.frame(EXPERIMENT = 'HepG2-SRRM2-KD',
             TYPE = "Alt. 3' ss",
             EVENT = c(A3_common, A3_uniqHepG2)),
  
  data.frame(EXPERIMENT = 'HeLa-SRRM2-KD',
             TYPE = "Alt. 5' ss",
             EVENT = c(A5_common, A5_uniqHeLa)),
  
  data.frame(EXPERIMENT = 'HepG2-SRRM2-KD',
             TYPE = "Alt. 5' ss",
             EVENT = c(A5_common, A5_uniqHepG2))
) -> alt_splice_sites_IDs

alt_splice_sites_IDs$DIRECTION = 'BOTH'

shared_and_uniq_AS_events <- rbind(exons_IDs, introns_IDs, alt_splice_sites_IDs)

Save to file

Code
write_delim(x = shared_and_uniq_AS_events, col_names = T, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '0_ALL_SHARED_UNIQUE_ALT_SPLICED_EVENTS.tab'), delim = '\t')

8.5 Annotate the differentially spliced events

Code
SHR_UNI_AS_events <- read_delim(file = file.path(dse_IDs_dir, '0_ALL_SHARED_UNIQUE_ALT_SPLICED_EVENTS.tab'),
                                delim = '\t', quote = 'none', 
                                col_names = T, show_col_types = FALSE)

Annotate these events by fetching the PSI and coordinates from the main inclusion table.

Code
fetch_diff_IDs <- unique(SHR_UNI_AS_events$EVENT)
message(length(fetch_diff_IDs), ' events to get info... this may take some time')
4071 events to get info... this may take some time

This next step might take some time to process.

Code
grep_psi(inclusion_tbl = psi_path, vst_id = fetch_diff_IDs) |>
  tidy_vst_psi() -> shr_uni_psi_df

Merge all the differentially spliced events with the annotation data. 3.

Code
DSE_tbl <- right_join(x = SHR_UNI_AS_events, y = shr_uni_psi_df, by = join_by(EVENT), multiple = "all")
Warning in right_join(x = SHR_UNI_AS_events, y = shr_uni_psi_df, by = join_by(EVENT), : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 32267 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Export the annotated differentially spliced events.

Code
stopifnot( length(unique(SHR_UNI_AS_events$EVENT)) == length(unique(DSE_tbl$EVENT)) )

write_delim(x = DSE_tbl, col_names = T, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '0_ALL_SHARED_UNIQUE_ALT_SPLICED_EVENTS_PSI.tab'),
            delim = '\t')

Import this table instead of processing it every time.

Code
DSE_tbl <- read_delim(file = file.path(dse_IDs_dir, '0_ALL_SHARED_UNIQUE_ALT_SPLICED_EVENTS_PSI.tab'),
                                delim = '\t', quote = 'none', 
                                col_names = T, show_col_types = FALSE)

Filter out all the samples not needed in this analysisi.

Code
DSE_tbl_HeLa <- subset(DSE_tbl, EXPERIMENT == "HeLa-SRRM2-KD") |>
  subset(!grepl(Sample, pattern = '^HepG2')) |>
  subset(grepl(Sample, pattern = 'CNTRL|SRRM2'))

DSE_tbl_HepG2 <- subset(DSE_tbl, EXPERIMENT == "HepG2-SRRM2-KD") |>
  subset(!grepl(Sample, pattern = '^HeLa')) |>
  subset(grepl(Sample, pattern = 'CNTRL|SRRM2'))

Define samples and supersample groups in the HeLa dataset

Code
mrgd_cntrl <- "HeLa_CNTRL"
mrgd_SRRM2 <- "HeLa_SRRM2"

CNTRLKD <- c("HeLa_CNTRL_A", "HeLa_CNTRL_B", "HeLa_CNTRL_C")
SRRM2KD <- c("HeLa_SRRM2_A", "HeLa_SRRM2_B", "HeLa_SRRM2_C")

Calculate ∆PSIs for TAF dataset.

Code
dPSI_thshld <- 15

DSE_tbl_HeLa |>
  group_by(EVENT) |>
  summarise(mPSI_CNTRL = mean(PSI[Sample %in% mrgd_cntrl], na.rm = T),
            mPSI_SRRM2 = mean(PSI[Sample %in% mrgd_SRRM2], na.rm = T) ) |>
  mutate(dPSI_SRRM2 = mPSI_SRRM2 - mPSI_CNTRL ) |>
  subset( is.finite(dPSI_SRRM2) ) |>
  select(EVENT, dPSI_SRRM2) |>
  unique() |>
  # Add the ∆PSI of the filtered events to the original data frame
  left_join(y = DSE_tbl_HeLa, by = join_by(EVENT)) |>
  relocate(GENE, .before = EVENT) |>
  relocate(starts_with('dPSI'), .after = PSI) |>
  select(GENE, EVENT, starts_with('dPSI')) |>
  rename(dPSI_SRRM2_HeLa = dPSI_SRRM2) |>
  unique() |>
  mutate(AS_TYPE = case_when( grepl(pattern = "^HsaALTA", x = EVENT) ~ 'Alt. Acceptor',
                              grepl(pattern = "^HsaALTD", x = EVENT) ~ 'Alt. Donor',
                              grepl(pattern = "^HsaINT", x = EVENT) ~ 'Intron',
                              grepl(pattern = "^HsaEX", x = EVENT) ~ 'Exon') ) |>
  relocate(AS_TYPE, .after = EVENT) |>
  mutate(direction_SRRM2_HeLa = case_when( dPSI_SRRM2_HeLa >= dPSI_thshld ~ 'UP',
                                           dPSI_SRRM2_HeLa <= -dPSI_thshld ~ 'DOWN',
                                           between(dPSI_SRRM2_HeLa, left = -dPSI_thshld, right = dPSI_thshld) ~ 'NONE') ) |>
  mutate(direction_SRRM2_HeLa = factor(direction_SRRM2_HeLa, levels = c('UP', 'DOWN', 'NONE'))) |>
  arrange(direction_SRRM2_HeLa, desc(dPSI_SRRM2_HeLa)) -> tidy_dPSI_SRRM2_HeLa

Note: these ∆PSIs are are calculated from the mean ∆PSI, similarly to how vast-tools compare would calcualte them, so the events that are found differentially spliced with the diff method are labelled as “NONE” because they are below 15 PSI.

Code
head(subset(tidy_dPSI_SRRM2_HeLa, direction_SRRM2_HeLa == "NONE"))
# A tibble: 6 × 5
  GENE   EVENT              AS_TYPE       dPSI_SRRM2_HeLa direction_SRRM2_HeLa
  <chr>  <chr>              <chr>                   <dbl> <fct>               
1 CUX1   HsaEX6009566       Exon                     14.7 NONE                
2 NAB1   HsaALTA0005536-1/2 Alt. Acceptor            14.4 NONE                
3 KLHL22 HsaEX0034811       Exon                     14.3 NONE                
4 EML3   HsaINT0055976      Intron                   14.2 NONE                
5 CASP3  HsaEX0012576       Exon                     14.2 NONE                
6 FBXW11 HsaEX0025363       Exon                     14.1 NONE                

Export the annotated differentially spliced events.

Code
write_delim(x = tidy_dPSI_SRRM2_HeLa, col_names = T, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '0_ALL_SHARED_UNIQUE_ALT_SPLICED_EVENTS_dPSI_HeLa.tab'),
            delim = '\t')

Define HepG2 samples

Code
mrgd_cntrl <- "HepG2_CNTRL"
mrgd_SRRM2 <- "HepG2_SRRM2"

CNTRLKD <- c("HepG2_CNTRL_A", "HepG2_CNTRL_B", "HepG2_CNTRL_C")
SRRM2KD <- c("HepG2_SRRM2_Aa", "HepG2_SRRM2_Ab",
             "HepG2_SRRM2_Ba", "HepG2_SRRM2_Bb",
             "HepG2_SRRM2_Ca", "HepG2_SRRM2_Cb")

Calculate ∆PSI in HepG2 cells.

Code
DSE_tbl_HepG2 |>
  group_by(EVENT) |>
  summarise(mPSI_CNTRL = mean(PSI[Sample %in% mrgd_cntrl], na.rm = T),
            mPSI_SRRM2 = mean(PSI[Sample %in% mrgd_SRRM2], na.rm = T) ) |>
  # calculate ∆PSI
  mutate(dPSI_SRRM2 = mPSI_SRRM2 - mPSI_CNTRL ) |>
  subset( is.finite(dPSI_SRRM2) ) |>
  select(EVENT, dPSI_SRRM2) |>
  unique() |>
  # Add the ∆PSI of the filtered events to the original data frame
  left_join(y = DSE_tbl_HepG2, by = join_by(EVENT)) |>
  relocate(GENE, .before = EVENT) |>
  relocate(starts_with('dPSI'), .after = PSI) |>
  select(GENE, EVENT, starts_with('dPSI')) |>
  rename(dPSI_SRRM2_HepG2 = dPSI_SRRM2) |>
  unique() |>
  mutate(AS_TYPE = case_when( grepl(pattern = "^HsaALTA", x = EVENT) ~ 'Alt. Acceptor',
                              grepl(pattern = "^HsaALTD", x = EVENT) ~ 'Alt. Donor',
                              grepl(pattern = "^HsaINT", x = EVENT) ~ 'Intron',
                              grepl(pattern = "^HsaEX", x = EVENT) ~ 'Exon') ) |>
  relocate(AS_TYPE, .after = EVENT) |>
  mutate(direction_SRRM2_HepG2 = case_when( dPSI_SRRM2_HepG2 >= dPSI_thshld ~ 'UP',
                                       dPSI_SRRM2_HepG2 <= -dPSI_thshld ~ 'DOWN',
                                           between(dPSI_SRRM2_HepG2, left = -dPSI_thshld, right = dPSI_thshld) ~ 'NONE') ) |>
  mutate(direction_SRRM2_HepG2 = factor(direction_SRRM2_HepG2, levels = c('UP', 'DOWN', 'NONE'))) |>
  arrange(direction_SRRM2_HepG2, desc(dPSI_SRRM2_HepG2)) -> tidy_dPSI_SRRM2_HepG2

Export the annotated differentially spliced events.

Code
write_delim(x = tidy_dPSI_SRRM2_HepG2, col_names = T, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '0_ALL_SHARED_UNIQUE_ALT_SPLICED_EVENTS_dPSI_HepG2.tab'),
            delim = '\t')

Generate a complete supplementary table for the paper.

Code
Supp_Tbl_SRRM2_HeLa <-
    left_join(x = tidy_dPSI_SRRM2_HeLa, y = DSE_tbl_HeLa, by = join_by(EVENT, GENE) ) |>
    select( !c(TYPE, Quality_Score_Type, direction_SRRM2_HeLa) ) |>
    relocate(EVENT, .after = GENE) |>
    relocate(AS_TYPE, .after = COMPLEX) |>
    relocate(EXPERIMENT, .before = GENE) |>
    relocate(dPSI_SRRM2_HeLa, .after = PSI) |>
    relocate(DIRECTION, .after = Quality_Score_Value) |>
    unique() |>
    arrange(AS_TYPE, desc(dPSI_SRRM2_HeLa))

The direction column indicates which event are up or down regulated in SRRM2-KD alone or are both up&down regulated since they are alternative splice sites.

Code
Supp_Tbl_SRRM2_HeLa |>
    select(EVENT, DIRECTION) |>
    unique() |>
    pull(DIRECTION) |>
    table()

BOTH DOWN   UP 
 374  543  544 
Code
Supp_Tbl_SRRM2_HeLa |>
    select(EVENT, AS_TYPE) |>
    unique() |>
    pull(AS_TYPE) |>
    table()

Alt. Acceptor    Alt. Donor          Exon        Intron 
          183           191           824           262 

Write to file as a tab delimited file.

Code
write_delim(x = Supp_Tbl_SRRM2_HeLa, col_names = T, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '0_HeLa_Supplemtentary_Table_DSE_SHARED_UNIQUE.tab'),
            delim = '\t')

Now create a supplementary table for the HepG2, just in case it will be needed.

Code
Supp_Tbl_SRRM2_HepG2 <- 
    left_join(x = tidy_dPSI_SRRM2_HepG2, y = DSE_tbl_HepG2, , by = join_by(EVENT, GENE) ) |>
    select(!c(TYPE, Quality_Score_Type, direction_SRRM2_HepG2)) |>
    relocate(EVENT, .after = GENE) |>
    relocate(AS_TYPE, .after = COMPLEX) |>
    relocate(EXPERIMENT, .before = GENE) |>
    relocate(dPSI_SRRM2_HepG2, .after = PSI) |>
    relocate(DIRECTION, .after = Quality_Score_Value) |>
    unique() |>
    arrange(AS_TYPE, desc(dPSI_SRRM2_HepG2))

The direction column indicates which event are up or down regulated in SRRM2-KD alone or are both up&down regulated since they are alternative splice sites.

Code
Supp_Tbl_SRRM2_HepG2 |>
    select(EVENT, DIRECTION) |>
    unique() |>
    pull(DIRECTION) |>
    table()

BOTH DOWN   UP 
 735 1158  773 
Code
Supp_Tbl_SRRM2_HepG2 |>
    select(EVENT, AS_TYPE) |>
    unique() |>
    pull(AS_TYPE) |>
    table()

Alt. Acceptor    Alt. Donor          Exon        Intron 
          381           354           991           940 

Write to file as a tab delimited file.

Code
write_delim(x = Supp_Tbl_SRRM2_HepG2, col_names = T, append = F, quote = 'none', escape = 'none',
            file = file.path(dse_IDs_dir, '0_HepG2_Supplemtentary_Table_DSE_SHARED_UNIQUE.tab'),
            delim = '\t')

9 Conclusion

Note

The interesting observation is that there’s a little overlap between between HeLa and HepG2 SRRM2 KD.

Further exploration would be required to understand if this is a cell-line specific effect of a different batch issue. Analyzing SRRM2-KD in other cellular context could shed some light on this observation.

10 Session Info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.4.2 (2024-10-31)
 os       AlmaLinux 9.5 (Teal Serval)
 system   x86_64, linux-gnu
 ui       X11
 language en_US.UTF-8
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Madrid
 date     2025-03-03
 pandoc   3.4 @ /users/mirimia/narecco/software/miniconda/envs/analysis/bin/ (via rmarkdown)
 quarto   1.6.41 @ /users/mirimia/narecco/software/miniconda/envs/analysis/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 ! package              * version  date (UTC) lib source
   abind                  1.4-5    2016-07-21 [1] CRAN (R 4.4.1)
   ade4                   1.7-23   2025-02-14 [1] CRAN (R 4.4.2)
   AnnotationDbi          1.68.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   backports              1.5.0    2024-05-23 [1] CRAN (R 4.4.1)
   Biobase                2.66.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   BiocFileCache          2.14.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   BiocGenerics           0.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   BiocParallel           1.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   biomaRt                2.62.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   Biostrings             2.74.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   bit                    4.5.0.1  2024-12-03 [1] CRAN (R 4.4.2)
   bit64                  4.5.2    2024-09-22 [1] CRAN (R 4.4.1)
   bitops                 1.0-9    2024-10-03 [1] CRAN (R 4.4.1)
   blob                   1.2.4    2023-03-17 [1] CRAN (R 4.4.1)
   broom                  1.0.7    2024-09-26 [1] CRAN (R 4.4.1)
   cachem                 1.1.0    2024-05-16 [1] CRAN (R 4.4.1)
   Cairo                * 1.6-2    2023-11-28 [1] CRAN (R 4.4.1)
   car                    3.1-3    2024-09-27 [1] CRAN (R 4.4.1)
   carData                3.0-5    2022-01-06 [1] CRAN (R 4.4.1)
   cli                    3.6.4    2025-02-13 [1] CRAN (R 4.4.2)
   codetools              0.2-20   2024-03-31 [1] CRAN (R 4.4.1)
   colorspace             2.1-1    2024-07-26 [1] CRAN (R 4.4.1)
   crayon                 1.5.3    2024-06-20 [1] CRAN (R 4.4.1)
   csaw                   1.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   curl                   6.0.1    2024-11-14 [1] CRAN (R 4.4.2)
   DBI                    1.2.3    2024-06-02 [1] CRAN (R 4.4.1)
   dbplyr                 2.5.0    2024-03-19 [1] CRAN (R 4.4.1)
   DelayedArray           0.32.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   desc                   1.4.3    2023-12-10 [1] CRAN (R 4.4.1)
   DESeq2                 1.46.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   devtools               2.4.5    2022-10-11 [1] CRAN (R 4.4.1)
   digest                 0.6.37   2024-08-19 [1] CRAN (R 4.4.1)
   dplyr                  1.1.4    2023-11-17 [1] CRAN (R 4.4.1)
   edgeR                  4.4.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.4.1)
   evaluate               1.0.3    2025-01-10 [1] CRAN (R 4.4.2)
   farver                 2.1.2    2024-05-13 [1] CRAN (R 4.4.1)
   fastmap                1.2.0    2024-05-15 [1] CRAN (R 4.4.1)
   filelock               1.0.3    2023-12-11 [1] CRAN (R 4.4.1)
   forcats                1.0.0    2023-01-29 [1] CRAN (R 4.4.1)
   foreign                0.8-88   2025-01-12 [1] CRAN (R 4.4.2)
   Formula                1.2-5    2023-02-24 [1] CRAN (R 4.4.1)
   fs                     1.6.5    2024-10-30 [1] CRAN (R 4.4.1)
   generics               0.1.3    2022-07-05 [1] CRAN (R 4.4.1)
   GenomeInfoDb           1.42.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   GenomeInfoDbData       1.2.13   2025-03-03 [1] Bioconductor
   GenomicRanges          1.58.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   ggalluvial             0.12.5   2023-02-22 [1] CRAN (R 4.4.1)
   ggfittext              0.10.2   2024-02-01 [1] CRAN (R 4.4.1)
   ggforce              * 0.4.2    2024-02-19 [1] CRAN (R 4.4.1)
   ggplot2              * 3.5.1    2024-04-23 [1] CRAN (R 4.4.1)
   ggrepel              * 0.9.6    2024-09-07 [1] CRAN (R 4.4.1)
   ggseqlogo              0.2      2024-02-09 [1] CRAN (R 4.4.1)
   glue                   1.8.0    2024-09-30 [1] CRAN (R 4.4.1)
   gridExtra              2.3      2017-09-09 [1] CRAN (R 4.4.1)
   gtable                 0.3.6    2024-10-25 [1] CRAN (R 4.4.1)
   gtools                 3.9.5    2023-11-20 [1] CRAN (R 4.4.1)
   hms                    1.1.3    2023-03-21 [1] CRAN (R 4.4.1)
   htmltools              0.5.8.1  2024-04-04 [1] CRAN (R 4.4.1)
   htmlwidgets            1.6.4    2023-12-06 [1] CRAN (R 4.4.1)
   httpuv                 1.6.15   2024-03-26 [1] CRAN (R 4.4.1)
   httr                   1.4.7    2023-08-15 [1] CRAN (R 4.4.1)
   httr2                  1.1.0    2025-01-18 [1] CRAN (R 4.4.2)
   IRanges                2.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   jsonlite               1.9.0    2025-02-19 [1] CRAN (R 4.4.2)
   KEGGREST               1.46.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   knitr                  1.49     2024-11-08 [1] CRAN (R 4.4.1)
   labeling               0.4.3    2023-08-29 [1] CRAN (R 4.4.1)
   later                  1.4.1    2024-11-27 [1] CRAN (R 4.4.2)
   lattice                0.22-6   2024-03-20 [1] CRAN (R 4.4.1)
   lifecycle              1.0.4    2023-11-07 [1] CRAN (R 4.4.1)
   limma                  3.62.1   2024-11-03 [1] Bioconductor 3.20 (R 4.4.2)
   locfit                 1.5-9.11 2025-02-03 [1] CRAN (R 4.4.2)
   magrittr               2.0.3    2022-03-30 [1] CRAN (R 4.4.1)
   MASS                   7.3-64   2025-01-04 [1] CRAN (R 4.4.2)
   Matrix                 1.7-2    2025-01-23 [1] CRAN (R 4.4.2)
   MatrixGenerics         1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   matrixStats            1.5.0    2025-01-07 [1] CRAN (R 4.4.2)
   memoise                2.0.1    2021-11-26 [1] CRAN (R 4.4.1)
   metapod                1.14.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   MetBrewer              0.2.0    2022-03-21 [1] CRAN (R 4.4.1)
   mime                   0.12     2021-09-28 [1] CRAN (R 4.4.1)
   miniUI                 0.1.1.1  2018-05-18 [1] CRAN (R 4.4.1)
   msa                    1.38.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   munsell                0.5.1    2024-04-01 [1] CRAN (R 4.4.1)
 R niar                 * 0.3.0    <NA>       [?] <NA>
   patchwork              1.3.0    2024-09-16 [1] CRAN (R 4.4.1)
   pheatmap             * 1.0.12   2019-01-04 [1] CRAN (R 4.4.1)
   pillar                 1.10.1   2025-01-07 [1] CRAN (R 4.4.2)
   pkgbuild               1.4.6    2025-01-16 [1] CRAN (R 4.4.2)
   pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.4.1)
   pkgload                1.4.0    2024-06-28 [1] CRAN (R 4.4.1)
   plyr                   1.8.9    2023-10-02 [1] CRAN (R 4.4.1)
   png                    0.1-8    2022-11-29 [1] CRAN (R 4.4.1)
   polyclip               1.10-7   2024-07-23 [1] CRAN (R 4.4.1)
   prettyunits            1.2.0    2023-09-24 [1] CRAN (R 4.4.1)
   profvis                0.4.0    2024-09-20 [1] CRAN (R 4.4.1)
   progress               1.2.3    2023-12-06 [1] CRAN (R 4.4.1)
   promises               1.3.2    2024-11-28 [1] CRAN (R 4.4.2)
   psychTools             2.4.2    2024-01-18 [1] CRAN (R 4.4.1)
   purrr                  1.0.4    2025-02-05 [1] CRAN (R 4.4.2)
   R6                     2.6.1    2025-02-15 [1] CRAN (R 4.4.2)
   rappdirs               0.3.3    2021-01-31 [1] CRAN (R 4.4.1)
   RColorBrewer           1.1-3    2022-04-03 [1] CRAN (R 4.4.1)
   Rcpp                   1.0.14   2025-01-12 [1] CRAN (R 4.4.2)
   readr                  2.1.5    2024-01-10 [1] CRAN (R 4.4.1)
   remotes                2.5.0    2024-03-17 [1] CRAN (R 4.4.1)
   rlang                  1.1.5    2025-01-17 [1] CRAN (R 4.4.2)
   rmarkdown              2.29     2024-11-04 [1] CRAN (R 4.4.1)
   rprojroot              2.0.4    2023-11-05 [1] CRAN (R 4.4.1)
   Rsamtools              2.22.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   RSQLite                2.3.9    2024-12-03 [1] CRAN (R 4.4.2)
   rstatix              * 0.7.2    2023-02-01 [1] CRAN (R 4.4.1)
   rstudioapi             0.17.1   2024-10-22 [1] CRAN (R 4.4.1)
   S4Arrays               1.6.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   S4Vectors              0.44.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   scales                 1.3.0    2023-11-28 [1] CRAN (R 4.4.1)
   seqinr                 4.2-36   2023-12-08 [1] CRAN (R 4.4.1)
   sessioninfo            1.2.3    2025-02-05 [1] CRAN (R 4.4.2)
   shiny                  1.10.0   2024-12-14 [1] CRAN (R 4.4.2)
   SparseArray            1.6.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   statmod                1.5.0    2023-01-06 [1] CRAN (R 4.4.1)
   stringi                1.8.4    2024-05-06 [1] CRAN (R 4.4.1)
   stringr                1.5.1    2023-11-14 [1] CRAN (R 4.4.1)
   SummarizedExperiment   1.36.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   tibble                 3.2.1    2023-03-20 [1] CRAN (R 4.4.1)
   tidyr                  1.3.1    2024-01-24 [1] CRAN (R 4.4.1)
   tidyselect             1.2.1    2024-03-11 [1] CRAN (R 4.4.1)
   tweenr                 2.0.3    2024-02-26 [1] CRAN (R 4.4.1)
   tzdb                   0.4.0    2023-05-12 [1] CRAN (R 4.4.1)
   UCSC.utils             1.2.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   UpSetR               * 1.4.0    2019-05-22 [1] CRAN (R 4.4.1)
   urlchecker             1.0.1    2021-11-30 [1] CRAN (R 4.4.1)
   usethis                3.1.0    2024-11-26 [1] CRAN (R 4.4.2)
   utf8                   1.2.4    2023-10-22 [1] CRAN (R 4.4.1)
   vctrs                  0.6.5    2023-12-01 [1] CRAN (R 4.4.1)
   viridis              * 0.6.5    2024-01-29 [1] CRAN (R 4.4.1)
   viridisLite          * 0.4.2    2023-05-02 [1] CRAN (R 4.4.1)
   vroom                  1.6.5    2023-12-05 [1] CRAN (R 4.4.1)
   withr                  3.0.2    2024-10-28 [1] CRAN (R 4.4.1)
   xfun                   0.51     2025-02-19 [1] CRAN (R 4.4.2)
   XICOR                  0.4.1    2023-04-21 [1] CRAN (R 4.4.1)
   xml2                   1.3.6    2023-12-04 [1] CRAN (R 4.4.1)
   xtable                 1.8-4    2019-04-21 [1] CRAN (R 4.4.1)
   XVector                0.46.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
   yaml                   2.3.10   2024-07-26 [1] CRAN (R 4.4.1)
   zlibbioc               1.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)

 [1] /users/mirimia/narecco/software/miniconda/envs/analysis/lib/R/library

 * ── Packages attached to the search path.
 R ── Package was removed from disk.

──────────────────────────────────────────────────────────────────────────────

10.1 Quarto

This document is generated using Quarto which enables to weave together content and executable code into a finished document. This is an improved version of what used to be called ‘Rmarkdown’. The document hides all code chunks, but they can be opened up with the drop down arrow. On the top right corner there’s a toggle for dark-mode. To learn more about Quarto see here.

References

Cui, Haissi, Jolene K. Diedrich, Douglas C. Wu, Justin J. Lim, Ryan M. Nottingham, James J. Moresco, John R. Yates, Benjamin J. Blencowe, Alan M. Lambowitz, and Paul Schimmel. 2023. Arg-tRNA synthetase links inflammatory metabolism to RNA splicing and nuclear trafficking via SRRM2.” Nature Cell Biology 25 (4): 592–603. https://doi.org/10.1038/s41556-023-01118-8.
Han, Hong, Ulrich Braunschweig, Thomas Gonatopoulos-Pournatzis, Robert J. Weatheritt, Calley L. Hirsch, Kevin C H Ha, Ernest Radovani, et al. 2017. Multilayered Control of Alternative Splicing Regulatory Networks by Transcription Factors. Molecular Cell 65 (3): 539–553.e7. https://doi.org/10.1016/j.molcel.2017.01.011.
Tapial, Javier, Kevin C. H. Ha, Timothy Sterne-Weiler, André Gohr, Ulrich Braunschweig, Antonio Hermoso-Pulido, Mathieu Quesnel-Vallières, et al. 2017. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.” Genome Research 27 (10): 1759–68. https://doi.org/10.1101/gr.220962.117.
Weatheritt, Robert J, Timothy Sterne-Weiler, and Benjamin J Blencowe. 2016. The ribosome-engaged landscape of alternative splicing.” Nature Structural & Molecular Biology 23 (12): 1117–23. https://doi.org/10.1038/nsmb.3317.
Zhang, Mengjun, Zhuang Gu, Shuanghui Guo, Yingtian Sun, Suibin Ma, Shuo Yang, Jierui Guo, et al. 2024. SRRM2 phase separation drives assembly of nuclear speckle subcompartments. Cell Reports 43 (3): 113827. https://doi.org/10.1016/j.celrep.2024.113827.